A myocardial infarction (MI), commonly known as a heart attack, occurs when blood flow to the coronary artery of the heart decreases or stops, causing damage to the heart muscle. In a broad term, acute coronary syndrome (ACS), which refers to any constellation of clinical symptoms that may result in acute myocardial ischemia, is among the leading causes of death globally as well as in the US according to reports of WHO and CDC. According to WHO, ischemic heart diseases (IHD) or coronary artery diseases (CAD), despite advances in diagnosis and treatment, continue to be one of the leading causes of mortality across the globe. High mortality rates in addition to financial burden due to acute MI can be significantly reduced with timely and appropriate diagnosis.
A rapid diagnostic test using relevant biomarkers can facilitate an efficient triage of patients with MI or the possibility of MI in case of an inconclusive troponin test and electrocardiogram (ECG, also referred to as EKG) readout. In the initial period of an MI, inconclusive troponin and ECG results can delay critical treatment procedures and increase the burden of patients under observation in the emergency department (ED). A new and accurate diagnostic test that is easy to use in an ED setting and is effective at ruling out the possibility of MI in patients can be immensely beneficial. Rapid and early rule-out of MI can lower the financial and medical costs by avoiding unnecessary hospitalization, expensive investigative procedures, and treatments. Conversely, a quick rule-in of patients undergoing MI would trigger more timely advancement to further tests and interventions, such as surgical procedures, that may be more complex and resource consuming. Reduction of mortality rates may be possible by providing timely intervention.
There is thus a need for new, accurate, rapid, and affordable methods for identifying patients who have MI. The present disclosure satisfies this need and provides other advantages as well.
In one aspect, the present disclosure provides a method for a patient suspected of undergoing a myocardial infarction (MI), the method comprising: a) receiving a biological sample obtained from a patient suspected of undergoing the MI; b) measuring one or more expression levels of at least one biomarker selected from Table 2; c) generating a composite biomarker value based on the one or more expression levels of the at least one biomarker in the biological sample; and d) determining, based on the composite biomarker value, whether the patient is undergoing the MI.
A method for a patient suspected of undergoing a myocardial infarction (MI), the method comprising: a) receiving a biological sample obtained from the patient suspected of undergoing the MI; b) measuring one or more expression levels of at least one biomarker selected from Table 2; c) determining whether the patient is undergoing the MI based on the one or more expression levels of the at least one biomarker in the biological sample. In some embodiments, step c) comprises generating a composite biomarker value based on the one or more expression levels of the at least one biomarker in the biological sample, and determining, based on the composite biomarker value, whether the patient is undergoing the MI.
In some embodiments, step b) comprises measuring the expression levels of at least two biomarkers selected from Table 2. In some embodiments, the at least one biomarker is selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3. In some embodiments, the at least one biomarker is selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65. In some embodiments, the at least one biomarker is selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3.
In some embodiments of the aforementioned aspects, the composite biomarker value not exceeding a threshold value indicates that the patient is not undergoing the MI, wherein the threshold value is determined using training samples of patients that are determined to not be undergoing the MI by a separate testing procedure. In certain embodiments, the method further comprises the step of determining that the patient is not undergoing the MI by comparing the composite biomarker value to the threshold value. In particular embodiments, the patient is further evaluated by a physician for a condition selected from the group consisting of anxiety disorder, aortic dissection, aortic stenosis, asthma, esophagitis, heart failure, gastroenteritis, left ventricular embolism, musculoskeletal disorder, myocarditis, pericarditis, pneumonia, pulmonary embolism, and unstable angina. In some embodiments, the method can further include the step of discharging the patient from a clinical facility based on determining that the patient is not undergoing the MI.
In some embodiments of the aforementioned aspects, the composite biomarker value exceeding a threshold value indicates that the patient is undergoing the MI and is a candidate for an additional cardiovascular diagnostic testing, a therapeutic intervention, or both, wherein the threshold value is determined using training samples of patients that are determined to be undergoing the MI by a separate testing procedure. In some embodiments, the method further comprises the step of determining that the patient is undergoing the MI by comparing the composite biomarker value to the threshold value and that the patient is a candidate for the additional cardiovascular diagnostic testing, the therapeutic intervention, or both. In some embodiments, the method comprises the step f) of subjecting the patient to the additional cardiovascular diagnostic testing, the therapeutic intervention, or both. In particular embodiments, the additional cardiovascular diagnostic testing comprises one or more of angiography, myocardial perfusion imaging, echocardiography, and magnetic resonance imaging. The angiography can be non-invasive computed tomography (CT) angiography or invasive coronary angiography (ICA). The therapeutic intervention can comprise administration of a pharmaceutical compound, an interventional procedure, or both. The pharmaceutical compound can be an anticoagulant, an antiplatelet, a beta-blocker, a nitrate, a statin, an angiotensin-converting-enzyme (ACE) inhibitor, or an angiotensin receptor blocker (ARBs). The interventional procedure can comprise revascularization (e.g., a percutaneous coronary intervention (PCI) or a coronary artery bypass graft (CABG)).
In some embodiments, the biological sample is obtained in response to the patient experiencing chest pain. In some embodiments, the biological sample is obtained in response to the patient having a cardiac troponin level above a cardiac troponin threshold value. In certain embodiments, the patient has serial cardiac troponin levels above a cardiac troponin level threshold value for at least 3 hours. In certain embodiments, the cardiac troponin threshold value is the 99th percentile upper reference limit value (as described further herein).
In some embodiments of the aforementioned aspects, the MI is ST-elevation myocardial infarction (STEMI). In other embodiments, the MI is non-ST-elevation myocardial infarction (NSTEMI).
In some embodiments, the biological sample is whole blood. In some embodiments, the biological sample is a blood component or a blood fraction such as red blood cells, white blood cells, platelets, peripheral blood mononucleated cells (PBMC), a band cell, a neutrophil, a monocyte, a T cell, or a combination thereof. In some embodiments, the biological sample is serum and/or plasma.
In some embodiments, the expression level of the at least one biomarker is detected using polymerase chain reaction (PCR) (e.g., quantitative PCR (qPCR), droplet digital PCR (ddPCR), reverse transcription PCR (RT-PCR), quantitative RT-PCR (qRT-PCR)), isothermal amplification (e.g., loop-mediated isothermal amplification (LAMP), reverse transcription LAMP (RT-LAMP), quantitative RT-LAMP (qRT-LAMP)), RPA amplification, ligase chain reaction, branched DNA amplification, nucleic acid sequence-based amplification (NASBA), strand displacement assay (SDA), transcription-mediated amplification, rolling circle amplification (RCA), helicase-dependent amplification (HDA), single primer isothermal amplification (SPIA), nicking and extension amplification reaction (NEAR), transcription mediated assay (TMA), CRISPR-Cas detection, and/or direct hybridization without amplification onto a functionalized surface (e.g., using a graphene biosensor).
In some embodiments, the expression level of the at least one biomarker is detected using qRT-PCR. In some embodiments, the expression level of the at least one biomarker is detected using qRT-LAMP.
In another aspect, the disclosure features a test kit for detecting the expression levels of one or more biomarkers in a biological sample of a patient suspected of undergoing a myocardial infarction (MI), wherein the one or more biomarkers comprise at least one biomarker from Table 2.
In certain embodiments of this aspect, the test kit comprises an oligonucleotide for each of the one or more biomarkers, wherein the oligonucleotide hybridizes to the biomarker or a transcript thereof. In some embodiments, the one or more biomarkers are selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3. The kit can further include, e.g., one or more reagents for performing polymerase chain reaction (PCR) (e.g., reverse transcription PCR (RT-PCR) or loop mediated isothermal amplification (LAMP) (e.g., reverse transcription LAMP (RT-LAMP). The PCR or LAMP can be quantitative.
In some embodiments, the kit is for detecting ST-elevation myocardial infarction (STEMI). In other embodiments, the kit is for detecting non-ST-elevation myocardial infarction (NSTEMI).
The biological sample can be whole blood. In some embodiments, the biological sample is a blood component or a blood fraction such as red blood cells, white blood cells, platelets, peripheral blood mononucleated cells (PBMC), a band cell, a neutrophil, a monocyte, a T cell, or a combination thereof. In some embodiments, the biological sample is serum and/or plasma.
The kit can further include instructions to calculate a composite biomarker value based on the expression levels of the one or more biomarkers in the biological sample of the patient, wherein the composite biomarker value, when compared to a threshold value, indicates the patient undergoing the MI.
In another aspect, the disclosure provides a method for providing an indication for a myocardial infarction (MI) in a subject, the method comprising: (a) measuring expression levels of one or more biomarkers selected from Table 2 or one or more biomarker pairs selected from Table 3 in a biological sample obtained from the subject; (b) evaluating the expression levels of the one or more biomarkers to yield a composite biomarker value, wherein the composite biomarker value exceeding a threshold value indicates that the subject is undergoing the MI; and (c) administering medical care to the subject. In another aspect, the disclosure provides a method for diagnosing a myocardial infarction (MI) in a subject, the method comprising (a) measuring expression levels of one or more biomarkers selected from Table 2 or one or more biomarker pairs selected from Table 3 in a biological sample obtained from the subject; (b) evaluating the expression levels of the one or more biomarkers to yield a composite biomarker value, wherein the composite biomarker value exceeding a threshold value indicates that the subject is undergoing the MI; and (c) administering medical care to the subject.
In some embodiments of the aforementioned aspects, the one or more biomarkers are selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3.
In some embodiments of the aforementioned aspects, the composite biomarker value has been validated in multiple cohorts. In particular embodiments, an area under the receiver operating characteristic (ROC) curve for the identification of subjects with MI for the composite biomarker value is at least 0.75 in an independent cohort from which the composite biomarker value was derived.
A better understanding of the nature and advantages of embodiments of the present disclosure may be gained with reference to the following detailed description and the accompanying drawings.
As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
The terms “a,” “an,” or “the” as used herein not only include aspects with one member, but also include aspects with more than one member. For instance, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a cell” includes a plurality of such cells and reference to “the agent” includes reference to one or more agents known to those skilled in the art, and so forth.
The terms “about” and “approximately” as used herein shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Typically, exemplary degrees of error are within 20 percent (%), preferably within 10%, and more preferably within 5% of a given value or range of values. Any reference to “about X” specifically indicates at least the values X, 0.8X, 0.81X, 0.82X, 0.83X, 0.84X, 0.85X, 0.86X, 0.87X, 0.88X, 0.89X, 0.9X, 0.91X, 0.92X, 0.93X, 0.94X, 0.95X, 0.96X, 0.97X, 0.98X, 0.99X, 1.01X, 1.02X, 1.03X, 1.04X, 1.05X, 1.06X, 1.07X, 1.08X, 1.09X, 1.1X, 1.11X, 1.12X, 1.13X, 1.14X, 1.15X, 1.16X, 1.17X, 1.18X, 1.19X, and 1.2X. Thus, “about X” is intended to teach and provide written description support for a claim limitation of, e.g., “0.98X.”
The term “nucleic acid” or “polynucleotide” refers to primers, probes, oligonucleotides, template RNA or cDNA, genomic DNA, amplified subsequences of biomarker genes, or any polynucleotide composed of deoxyribonucleic acids (DNA), ribonucleic acids (RNA), or any other type of polynucleotide which is an N-glycoside of a purine or pyrimidine base, or modified purine or pyrimidine bases in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions can be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. “Nucleic acid”, “DNA,” “polynucleotides”, and similar terms also include nucleic acid analogs. The polynucleotides are not necessarily physically derived from any existing or natural sequence, but can be generated in any manner, including chemical synthesis, DNA replication, reverse transcription or a combination thereof.
The term “primer” as used herein refers to an oligonucleotide, whether occurring naturally or produced synthetically, that is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, i.e., in the presence of nucleotides (e.g., naturally occurring nucleotides or modified nucleotides), and an agent for polymerization such as DNA polymerase and at a suitable temperature and buffer. Such conditions include the presence of four different deoxyribonucleoside triphosphates and a polymerization-inducing agent such as DNA polymerase or reverse transcriptase, in a suitable buffer (“buffer” includes substituents which are cofactors, or which affect pH, ionic strength, etc.), and at a suitable temperature. The primer is preferably single-stranded for maximum efficiency in amplification such as a TaqMan real-time quantitative RT-PCR as described herein. Primers may be double-stranded and dissociated into single-strands at a primer melting temperature prior to amplification. The primers herein are selected to be substantially complementary to the different strands of each specific sequence to be amplified, and a given set of primers will act together to amplify a subsequence of the corresponding biomarker gene.
The term “gene” refers to the segment of DNA involved in producing a polypeptide chain. It can include regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons).
The term “myocardial infarction” or “MI” refers to the irreversible necrosis (e.g., death) of heart muscle or cardiac cells secondary to prolonged ischemia that occurs when blood flow decreases or stops to the coronary artery of the heart. This usually results from an imbalance of oxygen supply and demand. One of the most common symptoms of MI is chest pain or discomfort which may travel into other parts of the body, e.g., the shoulder, arm, back, neck, and jaw. In some incidents, the symptoms occur in the center or left side of the chest and last for more than a few minutes. The discomfort may occasionally feel like heartburn. Other symptoms may include, for example, shortness of breath, nausea, feeling faint, a cold sweat, and/or feeling tired. An MI may cause heart failure, an irregular heartbeat, cardiogenic shock, and/or cardiac arrest. Some risk factors of MI include, for example, high blood pressure, smoking, diabetes, lack of exercise, obesity, high blood cholesterol, poor diet, and excessive alcohol intake. Typically, a number of tests are useful to help with diagnosis, including electrocardiograms (ECGs), blood tests (i.e., to test for levels of cardiac enzymes), and coronary angiography. An ECG, which is a recording of the heart's electrical activity, may confirm an ST-elevation MI (STEMI), if ST elevation is present. Commonly used blood tests can test for, for example, levels of troponin and creatine kinase MB. In some cases, the appearance of cardiac enzymes in the circulation can indicate myocardial necrosis.
A “biological sample” refers to a biological specimen obtained from a subject containing, e.g., fluids, cells, or tissues from the subject. For the purposes of the present methods and compositions, a biological sample is taken from a subject suspected of undergoing an MI, and in particular embodiments the sample is whole blood sample or a component of whole blood. Examples of a component of whole blood include, but are not limited to, red blood cells, white blood cells, platelets, peripheral blood mononucleated cells (PBMC), a band cell, a neutrophil, a monocyte, a T cell, or a combination thereof. In other embodiments, the biological sample is serum and/or plasma.
As used herein, a “biomarker gene”, “biomarker mRNA”, or “biomarker” refers to a gene whose expression level in a biological sample of a subject (e.g., a whole blood sample) can be used for diagnosing whether a subject is undergoing an MI. The expression level of each of the genes needs not be correlated with the a patient undergoing MI in all patients; rather, a correlation will exist at the population level, such that the level of expression is sufficiently correlated within the overall population of individuals undergoing MI that it can be combined with the expression levels of other biomarker genes, in any of a number of ways, as described elsewhere herein, and used to calculate a composite biomarker value. The values used for the measured expression level of the individual biomarker genes can be determined in any of a number of ways, including direct readouts from relevant instruments or assay systems or reporter systems, or values determined using methods including, but not limited to, forms of linear or non-linear transformation, rescaling, normalizing, z-scores, ratios against a common reference value, or any other means known to those of skill in the art. In some embodiments, the readout values of the biomarkers are compared to the readout value of a reference or control, e.g., a housekeeping gene whose expression is measured at the same time as the biomarkers. For example, the ratio or log ratio of the biomarkers to the reference gene can be determined. In some embodiments, biomarker genes for the purposes of the present methods include, e.g., SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3, but others can be used as well, e.g., those presented in Table 2.
The term “composite biomarker value” or “biomarker score” refers to a value allowing a determination of a patient undergoing MI. The composite biomarker value is calculated from the measured expression levels or other readouts of one or a plurality of biomarker genes, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more individual biomarker genes, in a biological sample (e.g., whole blood sample) from a subject. In some embodiments, the composite biomarker value is determined by applying a mathematical formula, or a series of mathematical formulae with specified interconnections, or a machine learning algorithm with optimized hyperparameters, or another parameter-based method by which the measured expression values of the biomarker genes can be used to generate a single “composite biomarker value”, including, e.g., arithmetic or geometric means with or without weights, linear regression, logistic regression, neural nets, or any other method known in the art. In particular embodiments, the “composite biomarker value” is used to determine whether the patient is undergoing MI by virtue of the composite biomarker value surpassing or not a given threshold value for the outcome in question, as described in more detail elsewhere herein. In some embodiments, the “composite biomarker value” is further converted to a metric value representing the probability of undergoing MI via one or more calibration procedures. The calibration procedure(s) may output a threshold value for diagnosis. In some embodiments, the composite biomarker value is and can be further combined with other factors, such as the presence or severity of specific symptoms, patient factors (e.g. age, sex, vital signs, comorbidities, prior treatment history, or other relevant clinical parameters) to improve the performance or predictive value of the composite biomarker value in determining whether a patient is undergoing MI.
The term “correlating” generally refers to determining a relationship between one random variable with another. In various embodiments, correlating a given composite biomarker value with the presence or absence of a condition or outcome (e.g., undergoing MI) in a subject comprises determining the expression level of at least one biomarker in the subject, using the expression level to calculate a composite biomarker value, and comparing the composite biomarker value with a threshold value, which can be a composite biomarker value from a control subject, such as a healthy subject or a subject who had undergone MI. In specific embodiments, a composite biomarker value calculated based on the expression levels of a set of biomarkers is correlated to the presence or absence of a particular outcome, using receiver operating characteristic (ROC) curves.
The present disclosure provides methods and compositions for determining whether a subject is undergoing myocardial infarction (MI). The present methods and compositions involve biomarkers identified from transcriptomic data of blood samples from individuals with MI and controls, e.g., controls from risk populations (e.g, risk subjects with CAD or stable/unstable angina) or healthy controls. Meta-analysis was used to identify circulating biomarkers from blood that are differentially expressed in the presence of an MI event. 480 mRNAs were identified that can robustly distinguish MI samples from the control samples with a good accuracy suitable for the clinical setting. 15 mRNAs were further down-selected as a parsimonious gene set using a greedy forward search procedure. Diagnostic tests can be established using subsets of the 480 identified mRNA biomarkers. Such diagnostic tests can be utilized in clinics and emergency departments to enable clinicians to make better decisions about whether a subject is undergoing MI and to subsequently employ the appropriate testing and/or therapeutic interventions.
The present methods and compositions can be used to determine whether a subject is undergoing MI. In various embodiments, the subject may be an adult of any age, a child, or an adolescent. The subject may be male or female.
In particular embodiments, the subject has one or more of the following symptoms: chest pain or discomfort; feeling weak, light-headed, or faint; developing cold sweat; pain or discomfort in the jaw, neck, and/or back; pain or discomfort in one or both arms and/or shoulders; and/or shortness of breath. In some embodiments, the chest pain or discomfort is in the center or left side of the chest. In some embodiments, the chest pain or discomfort lasts for more than a few minutes (e.g., at least 2, 5, 10, 20, or 30 minutes). In some embodiments, the chest pain or discomfort goes away and comes back. In certain embodiments, the chest pain or discomfort can be uncomfortable pressure, squeezing, fullness, or pain.
In particular embodiments, the subject is present in a medical context, e.g., a clinical setting where diagnosis and/or treatment may take place. A clinical setting does not necessarily indicate that the patient is physically present in a hospital or clinical facility, however. For example, the patient may be at home but has been in communication with a health care provider about his or her condition and its treatment.
The results of the methods described herein can allow a determination of the optimal next step or plan of action for the subject's care. In some embodiments, the determination is that the subject is not undergoing MI. In particular embodiments, the determination is that the subject is not undergoing MI even if the subject presents one or more of the symptoms described above (e.g., chest pain or discomfort). In particular embodiments, the determination is that a subject is undergoing MI. In certain embodiments, the determination is that the subject is undergoing MI when the subject presents one or more of the symptoms described above (e.g., chest pain or discomfort).
To assess the biomarker status of the subject, a biological sample is obtained. In some embodiments, the sample is a blood sample, e.g., plasma, serum, or whole blood sample. In particular embodiments, the sample is a whole blood sample obtained from the subject. In some embodiments, the biological sample is a blood component such as red blood cells, white blood cells, platelets, peripheral blood mononucleated cells (PBMC), a band cell, a neutrophil, a monocyte, a T cell, or a combination thereof. In some embodiments, the biological sample is serum and/or plasma.
Other potential samples that can be used include, urine, ascites, seminal fluid, vaginal secretions, cerebrospinal fluid (CSF), synovial fluid, pleural fluid (pleural lavage), pericardial fluid, peritoneal fluid, amniotic fluid, saliva, nasal fluid, otic fluid, gastric fluid, breast milk, amniotic fluid, bile, gastric juice, lymph, mucus, pericardial fluid, peritoneal fluid, pleural fluid, pus, saliva, sebum, serous fluid, sputum, sweat, tears, and others. The biological sample, e.g., a whole blood sample, can be obtained from the subject using conventional techniques known in the art. In some embodiments, the sample is obtained for the purposes of assessing the subject's biomarker status using the herein-described methods.
In some embodiments, the biological sample, e.g., a whole blood sample, can be obtained at different times from the subject. For example, the biological sample, e.g., a whole blood sample or a component thereof, can be obtained in response to the subject having a cardiac troponin level above a cardiac troponin threshold value (e.g., a cardiac troponin threshold value is the 99th percentile upper reference limit value). In particular embodiments, the subject has serial cardiac troponin levels above a cardiac troponin level threshold value (e.g., a cardiac troponin threshold value is the 99th percentile upper reference limit value) over the course of at least 3 hours (e.g., at least 4, 5, 6, 7, 8, 9, or 10 hours). In some embodiments, cardiac troponin threshold values can be determined using commercially available troponin tests. Some examples of commercially available tests and their 99th percentile troponin values include, e.g., ARCHITECT STAT High Sensitive Troponin-I by Abbott (99th percentile troponin value at 15.6 ng/L for female and 34.2 ng/L for male); Access hsTnI by Beckman Coulter (99th percentile troponin value at 11.6 ng/L for female and 19.8 ng/L for male); VIDAS High Sensitive Troponin I by bioMérieux (99th percentile troponin value at 11 ng/L for female and 25 ng/L for male); Pylon hsTnI assay by ET Healthcare (99th percentile troponin value at 21 ng/L for female and 27 ng/L for male); Pylon hsTnT by ET Healthcare (99th percentile troponin value at 13 ng/L for female and 14 ng/L for male); Lumipulse G G1200 and G60011 hsTnI by Fujirebio (99th percentile troponin value at 22.4 ng/L for female and 32.9 ng/L for male); cobas e601, e602, E170/TnT Gen 5 STAT by Roche (99th percentile troponin value at 14 ng/L for female and 22 ng/L for male); ATELLICA High-Sensitivity TnI (TnIH) by Siemens (99th percentile troponin value at 38.6 ng/L for female and 53.5 ng/L for male); and Singulex Clarity cTnI (99th percentile troponin value at 8.76 ng/L for female and 9.23 ng/L for male). Other methods of determining troponin levels and troponin threshold values are described in, e.g., Apple, Clin Chem 2010; 56:886-91; Apple and Collinson, Clin Chem 2012; 58:54-61; Wu and Christenson, Clin Biochem 2013; 46:969-78.; and Wu et al., Clin Chem 2009; 55:52-8.
In another example, the biological sample, e.g., a whole blood sample or component thereof, can be obtained from the subject before, during, and/or after the subject is actively experiencing one or more of the symptoms described above, e.g., chest pain or discomfort. In certain embodiments, the biological sample, e.g., a whole blood sample, can be obtained from the subject minutes or hours (e.g., 5, 10, 20, 30, 40, or 50 minutes; e.g., 1, 2, 3, 4, or 5 hours) before the subject is actively experiencing one or more of the symptoms described above, e.g., chest pain or discomfort. In certain embodiments, the biological sample, e.g., a whole blood sample, can be obtained from the subject minutes or hours (e.g., 5, 10, 20, 30, 40, or 50 minutes; e.g., 1, 2, 3, 4, or 5 hours) after the subject has experienced one or more of the symptoms described above, e.g., chest pain or discomfort. In certain embodiments, the biological sample, e.g., a whole blood sample or component thereof, can be obtained from the subject while the subject is actively experiencing one or more of the symptoms described above, e.g., chest pain or discomfort. The biological sample, e.g., a whole blood sample, can be obtained by the same caregiver or clinical facility as that carrying out the herein-described methods, or can be obtained from a different source (e.g., different caregiver or clinical facility).
As described herein, the methods for a subject suspected of undergoing a myocardial infarction (MI) can include: a) receiving a biological sample obtained from a subject suspected of undergoing the MI; b) measuring one or more expression levels of at least one biomarker selected from Table 2; c) generating a composite biomarker value based on the one or more expression levels of the at least one biomarker in the sample; and d) determining, based on the composite biomarker value, whether the subject is undergoing the MI. In some embodiments of the methods, the composite biomarker value exceeding a threshold value indicates that the subject is undergoing the MI and is a candidate for an additional cardiovascular diagnostic testing, a therapeutic intervention, or both. In some embodiments, the threshold value is determined using training samples of subjects that are determined to be undergoing the MI by a separate testing procedure. The methods can further comprise the step e) of determining that the subject is undergoing the MI by comparing the composite biomarker value to the threshold value and that the subject is a candidate for the additional cardiovascular diagnostic testing, the therapeutic intervention, or both. The methods can further comprise the step f) of subjecting the subject to the additional cardiovascular diagnostic testing, the therapeutic intervention, or both. The additional cardiovascular diagnostic testing can comprise one or more of angiography, myocardial perfusion imaging, echocardiography, and magnetic resonance imaging. As used herein, the terms “composite biomarker value” and “biomarker score” are interchangeable.
In some embodiments, the MI is ST-elevation myocardial infarction (STEMI). In some embodiments, the MI is non-ST-elevation myocardial infarction (NSTEMI).
The determination that the subject is undergoing MI can be determined by calculating a composite biomarker value based on the expression levels of biomarkers in a biological sample, e.g., a whole blood sample or component thereof, obtained from the subject. In some embodiments, a panel of several biomarkers is used to calculate the composite biomarker value. For example, in some embodiments, biomarkers used in the methods include, but are not limited to, any one or more of the 480 biomarkers listed in Table 2. Any number of total biomarkers can be selected from the 480 biomarkers (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 95, 100, 200, 300, or 400 biomarkers) listed in Table 2 and be used to generate the composite biomarker value. In some embodiments, the biomarkers include any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more biomarkers listed in Table 2.
In some embodiments, the biomarkers include any one or more pairs of biomarkers that can be generated from the 480 biomarkers listed in Table 2. For example, in some embodiments, the biomarkers include any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more pairs of biomarkers that can be generated from the 480 biomarkers listed in Table 2.
In some embodiments, the biomarkers include any one or more pairs of biomarkers listed in Table 3. For example, in some embodiments, the biomarkers include any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 pairs of biomarkers listed in Table 3. In particular embodiments, the composite biomarker value is calculated based on the expression levels of any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 pairs of biomarkers listed in Table 3.
In some embodiments, a biomarker used in methods described herein is selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 biomakers selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3 are used in methods described herein. In particular embodiments, one or more pairs of biomarkers selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3 are used in methods described herein.
In some embodiments, a biomarker used in methods described herein is selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65. In some embodiments, at least 2, 3, 4, 5, 6, 7, or 8 biomakers selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65 are used in methods described herein. In particular embodiments, one or more pairs of biomarkers selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65 are used in methods described herein.
In some embodiments, a biomarker used in methods described herein is selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3. In some embodiments, at least 2, 3, 4, 5, 6, or 7 biomakers selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3 are used in methods described herein. In particular embodiments, one or more pairs of biomarkers selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3 are used in methods described herein.
The biomarkers used in the present methods correspond to genes whose expression levels in blood are differentially expressed in the presence of an MI event. The expression level of the individual biomarkers can be elevated or depressed in subjects undergoing MI. For example, in particular embodiments, the expression level of at least one biomarker (e.g., 2, 3, 4, 5, 6, 7, or 8) of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65 of a subject undergoing MI is reduced relative to a reference expression level of the same biomarker of a control subject (i.e., a subject who is not undergoing MI). In particular embodiments, the composite biomarker value generated from the expression level of at least one biomarker (e.g., 2, 3, 4, 5, 6, 7, or 8) of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65 of a subject undergoing MI exceeds a threshold value.
In another example, in particular embodiments, the expression level of at least one biomarker (e.g., 2, 3, 4, 5, 6, or 7) of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3 of a subject undergoing MI is increased relative to a reference expression level of the same biomarker of a control subject (i.e., a subject who is not undergoing MI). In particular embodiments, the composite biomarker value generated from the expression level of at least one biomarker (e.g., 2, 3, 4, 5, 6, or 7) of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3 of a subject undergoing MI exceeds a threshold value. The expression level of a biomarker can be positively or inversely correlated with the determination that the subject is undergoing MI, allowing the determination of an overall composite biomarker value that can be used to inform a diagnostic or treatment decision.
In some embodiments of the methods described herein, the composite biomarker value not exceeding a threshold value indicates that the subject is not undergoing the MI. The methods can further comprise the step e) of determining that the subject is not undergoing the MI by comparing the composite biomarker value to the threshold value. Once it is determined that the subject is not undergoing MI, the subject can be further evaluated by a physician for a condition selected from the group consisting of anxiety disorder, aortic dissection, aortic stenosis, asthma, esophagitis, heart failure, gastroenteritis, left ventricular embolism, musculoskeletal disorder, myocarditis, pericarditis, pneumonia, pulmonary embolism, and unstable angina. Further, the methods can include the step of discharging the subject from a clinical facility based on determining the subject is not undergoing the MI.
Additional biomarkers that can be used in the present methods can be assessed and identified using any standard analysis method or metric, e.g., by analyzing data from samples taken from subjects who are undergoing MI or who have undergone MI. Suitable metrics and methods include Pearson correlation, Kendall rank correlation, Spearman rank correlation, t-test, other non-parametric measures, linear regression, non-linear regression, random forest and other tree-based methods, artificial neural networks, etc. In one embodiment, the feature selection uses univariate ranking with the absolute value of the Pearson correlation between the gene expression and outcome as the ranking metric. In some embodiments, features (genes) are selected using metrics that measure the effect size between different groups of subjects, i.e., subjects with MI and controls, e.g., controls from risk populations (e.g, risk subjects with CAD or stable/unstable angina). In some embodiments, features (genes) are further selected via greedy forward search optimized on training accuracy for a parsimonious set of genes. In some embodiments, features (genes) are selected via greedy forward search optimized on Area Under Operator Receiver Characteristic.
In some embodiments, data from multiple sources is inputted to a multi-cohort analysis using appropriate software, e.g., the Metalntegrator package. In some embodiments, effect size is calculated for each mRNA within a study between different groups of subjects, i.e., subjects with MI and controls, e.g., controls from risk populations (e.g, risk subjects with CAD or stable/unstable angina), e.g., as Hedges' g. In some embodiments, the pooled or summary effect size across all of the datasets is then computed, e.g., using DerSimonian and Laird's random effects model. In some embodiments, the effect size is then summarized and p values across all mRNAs corrected for multiple testing, e.g., based on Benjamini-Hochberg false discovery rate (FDR). In some embodiments, the p-values across the studies are then combined, e.g., using Fisher's sum of logs method, and the log-sum of p values that each mRNA is up- or downregulated is computed, along with corresponding p values. In some embodiments, meta-analysis is performed, e.g., by performing leave one-study out (LOO) analysis by removing one dataset at a time. In some embodiments, a greedy forward search can be used to identify a parsimonious set of genes with the greatest discriminatory power to distinguish samples from different groups of subjects, i.e., subjects with MI and controls, e.g., controls from risk populations (e.g, risk subjects with CAD or stable/unstable angina).
In particular embodiments, a machine learning workflow is applied to the training data, e.g., using a separate validation set or using cross-validation. For example, hyperparameter tuning can be used over a search space of parameters, e.g., parameters known to be effective for model optimization for infectious disease diagnosis. Examples of classifiers that can be used include linear classifiers such as Support Vector Machine (SVM) with linear kernel, logistic regression, and multi-layer perceptron with linear activation function, and non-linear classifiers such as SVM with non-linear kernel. Feature selection can be performed using the gene expression data for the candidate biomarkers as independent variables and using the known outcome as the dependent variable. The different models can be evaluated, e.g., using plots based on sensitivity and false-positive rates for each model, and the decision threshold evaluated during the hyperparameter search, and using ROC-like plots based on pooled cross-validated probabilities for the best models. (See, e.g., Ramkumar et al., Development of a Novel Proteomic Risk-Classifier for Prognostication of Patients with Early-Stage Hormone Receptor-Positive Breast Cancer. Biomarker Insights, Vol. 13, 1-9, 2018,
As described in more detail below, data sets corresponding to the biomarker expression levels as described herein are used to create a composite biomarker value. The expression levels of the biomarkers can be assessed in any number of ways. In particular embodiments, the expression levels of the biomarkers are determined by measuring polynucleotide levels of the biomarkers. For example, once the biological sample (e.g., a whole blood sample or component thereof) has been collected and preserved, RNA can be extracted using any method, so long that it permits the preservation of the RNA for subsequent quantification of the expression levels of the biomarker genes and of any control genes to be used, e.g., housekeeping genes used as reference values for the biomarkers. RNA can be extracted, e.g., from preserved cells manually, or using a robotic apparatus, such as Qiacube (QIAGEN) with a commercial RNA extraction kit. In some embodiments, RNA extraction is not performed, e.g., for isothermal amplification methods. In such methods, expression levels can be determined directly through lysis of cells, and then, e.g., reverse transcription and amplification of mRNA.
In some embodiments, the reference nucleic acid is a housekeeping gene or a product thereof, such as a corresponding mRNA transcript. In some embodiments, the reference nucleic acid includes an mRNA transcript that is a pre-mRNA molecule, a 5′ capped mRNA molecule, a 3′ adenylated mRNA molecule, or a mature mRNA molecule. In particular embodiments, the reference nucleic acid is a mature mRNA molecule obtained from a mammalian host that is also the source of the test sample. In some embodiments, the housekeeping gene or product thereof is expressed at a relatively constant rate by a cell of the host, such that the expression rate of the housekeeping gene can be used as a reference point against the expression of other host genes or gene products thereof.
Exemplary human housekeeping genes suitable for use with the present methods include, but are not limited to, YWHAB, Chromosome 1 open reading frame 43 (C1orf43), Charged multivesicular body protein 2A (CHMP2A), ER membrane protein complex subunit 7 (EMC7), Glucose-6-phosphate isomerase (GPI), Proteasome subunit, beta type, 2 (PSMB2), Proteasome subunit, beta type, 4 (PSMB4), Member RAS oncogene family (RAB7A), Receptor accessory protein 5 (REEP5), small nuclear ribonucleoprotein D3 (SNRPD3), Valosin containing protein (VCP) and vacuolar protein sorting 29 homolog (VPS29). In some embodiments, any housekeeping gene provided at www/tau/ac/il˜elieis/HKG/may be used (see, Eisenberg and Levanon., Trends Genet. (2013), 10:569-74). Other suitable housekeeping genes include, e.g., GAPDH, ubiquitin, 18S (18S rRNA, e.g., HGNC (Human Genome Nomenclature Committee) nos. 44278-44281, 37657), ACTB (Actin beta, e.g., HGNC no. 132)), KPNA6 (Karyopherin subunit alpha 6, e.g., HGNC no. 6399), or RREB1 (ras-responsive element binding protein 1, e.g., HGNC no. 10449).
The levels of transcripts of the biomarker genes, or their levels relative to one another, and/or their levels relative to a reference gene such as a housekeeping gene, can be determined from the amount of mRNA, or polynucleotides derived therefrom, present in a biological sample. Polynucleotides can be detected and quantified by a variety of methods including, but not limited to, NanoString (e.g., nCounter analysis), microarray analysis, polymerase chain reaction (PCR) (e.g., quantitative PCR (qPCR), droplet digital PCR (ddPCR), reverse transcriptase polymerase chain reaction (RT-PCR), quantitative RT-PCR (qRT-PCR)), isothermal amplification (e.g., loop-mediated isothermal amplification (LAMP), reverse transcription LAMP (RT-LAMP), quantitative RT-LAMP (qRT-LAMP)), RPA amplification, ligase chain reaction, branched DNA amplification, nucleic acid sequence-based amplification (NASBA), strand displacement assay (SDA), transcription-mediated amplification, rolling circle amplification (RCA), helicase-dependent amplification (HDA), single primer isothermal amplification (SPIA), nicking and extension amplification reaction (NEAR), transcription mediated assay (TMA), CRISPR-Cas detection, direct hybridization without amplification onto a functionalized surface (e.g., graphene biosensor), serial analysis of gene expression (SAGE), internal DNA detection switch, northern blotting, RNA fingerprinting, sequencing methods, Qbeta replicase, strand displacement amplification, transcription based amplification systems, nuclease protection (Si nuclease or RNAse protection assays), as well as methods disclosed in International Publication Nos. WO 88/10315 and WO 89/06700, and International Applications Nos. PCT/US87/00880 and PCT/US89/01025; herein incorporated by reference in their entireties, and methods using MacMan probes, flip probes, and TaqMan probes (see, e.g., Murray et al. (2014) J. Mol Diag. 16:6, pp 627-638). See, e.g., Draghici, Data Analysis Tools for DNA Microarrays, Chapman and Hall/CRC, 2003; Simon et al., Design and Analysis of DNA Microarray Investigations, Springer, 2004; Real-Time PCR: Current Technology and Applications, Logan, Edwards, and Saunders eds., Caister Academic Press, 2009; Bustin, A-Z of Quantitative PCR (IUL Biotechnology, No. 5), International University Line, 2004; Velculescu et al. (1995) Science 270: 484-487; Matsumura et al. (2005) Cell. Microbiol. 7: 11-18; Serial Analysis of Gene Expression (SAGE): Methods and Protocols (Methods in Molecular Biology), Humana Press, 2008; each of which is herein incorporated by reference in its entirety.
In some embodiments, the biomarker gene expression is detected using a gene expression panel such as a NanoString nCounter, which allows the quantification of biomarker gene expression without the need for amplification or cDNA conversion. In such methods, RNA obtained from the blood or other biological sample from the subject is hybridized in solution to probes, e.g., a labeled reporter probe and a capture probe for each biomarker and control sequence. The target RNA-probe complexes are then purified and immobilized on a solid support, and then quantified, with each marker-specific probe having a specific fluorescent signature that allows the quantification of the specific marker. Such methods and the generation of probes, e.g., capture probes and reporter probes, for such applications are known in the art and are described, e.g., on the website nanostring.com.
For amplification-based methods such as qRT-PCR or qRT-LAMP, the primers can be obtained in any of a number of ways. For example, primers can be synthesized in the laboratory using an oligo synthesizer, e.g., as sold by Applied Biosystems, Biolytic Lab Performance, Sierra Biosystems, or others. Alternatively, primers and probes with any desired sequence and/or modification can be readily ordered from any of a large number of suppliers, e.g., ThermoFisher, Biolytic, IDT, Sigma-Aldritch, GeneScript, etc.
Both naturally occurring nucleotides, modified nucleotides, as well as labeled nucleotides, can be used in methods such as qRT-PCR and qRT-LAMP. A modified nucleotide can have a modified nucleobase, a modified sugar portion, and/or a modified internucleotisde linkage. A modified nucleobase (or base) refers to a nucleobase having at least one change that is structurally distinguishable from a naturally-occurring nucleobase (e.g., adenine, guanine, cytosine, thymine, or uracil). In some embodiments, a modified nucleobase is functionally interchangeable with its naturally-occurring counterpart. Both naturally-occurring and modified nucleobases are capable of hydrogen bonding. Modified nucleobases may help to improve the stability of a polynucleotide, such as increasing its half-life and preventing intracellular degradation and proteolytic cleavage. Examples of modified nucleobases include, but are not limited to, 5-methylcytosine, 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyladenine, 6-methylguanine, 2-propyladenine, 2-propylguanine, 2-thiouracil, 2-thiothymine, 2-thiocytosine, 5-halouracil, 5-halocytosine, 5-propynyluracil, 5-propynylcytosine, 6-azouracil, 6-azocytosine, 6-azothymine, 5-uracil (pseudouracil), 4-thiouracil, 8-haloadenine, 8-aminoadenine, 8-thioladenine, 8-thioalkyladenine, 8-hydroxyladenine, 8-haloguanine, 8-aminoguanine, 8-thiolguanine, 8-thioalkylguanine, 8-hydroxylguanine, 5-halouracil, 5-bromouracil, 5-trifluoromethyluracil, 5-halocytosine, 5-bromocytosine, 5-trifluoromethylcytosine, 7-methylguanine, 7-methyladenine, 2-fluoroadenine, 2-aminoadenine, 8-azaguanine, 8-azaadenine, 7-deazaguanine, 7-deazaadenine, 3-deazaguanine, and 3-deazaadenine.
A modified sugar refers to a sugar having at least one change that is structurally distinguishable from a naturally-occurring sugar (e.g., deoxyribose in DNA or ribose in RNA). Modifications on modified sugars may help to improve the stability of a polynucleotide. In some embodiments, the sugar is a pentofuranosyl sugar. The pentofuranosyl sugar ring of a nucleoside may be modified in various ways including, but not limited to, addition of a substituent group, particularly, at the 2′ position of the ring; bridging two non-geminal ring atoms to form a bicyclic sugar (i.e., a locked sugar); and substitution of an atom or group such as —S—, —N(R)— or —C(R1)(R2) for the ring oxygen. Examples of modified sugars include, but are not limited to, substituted sugars, especially 2′-substituted sugars having a 2′-F, 2′-OCH2 (2′-OMe), or a 2′-O(CH2)2-OCH3 (2′-O-methoxyethyl or 2′-MOE) substituent group; and bicyclic sugars. A bicyclic sugar refers to a modified pentofuranosyl sugar containing two fused rings. For example, a bicyclic sugar may have the 2′ ring carbon of the pentofuranose linked to the 4′ ring carbon by way of one or more carbons (i.e., a methylene) and/or heteroatoms (i.e., sulfur, oxygen, or nitrogen). The second ring in the sugar limits the flexibility of the sugar ring and thus, constrains the oligonucleotide in a conformation that is favorable for base pairing interactions with its target nucleic acids. An example of a bicyclic sugar is a locked sugar, which is a pentofuranosyl sugar having the 2′-oxygen linked to the 4′ ring carbon by way of a carbon (i.e., a methylene) or a heteroatom (i.e., sulfur, oxygen, or nitrogen). In some embodiments, a locked sugar has the 2′-oxygen linked to the 4′ ring carbon by way of a carbon (i.e., a methylene). In other words, a locked sugar has a 4′-(CH2)-O-2′ bridge, such as α-L-methyleneoxy (4′-CH2-O-2′) and β-D-methyleneoxy (4′-CH2-O-2′). A nucleoside having a lock sugar is referred to as a locked nucleoside.
Other examples of bicyclic sugars include, but are not limited to, (6′S)-6′ methyl bicyclic sugar, aminooxy (4′-CH2-O—N(R)-2′) bicyclic sugar, oxyamino (4′-CH2-N(R)—O-2′) bicyclic sugar, wherein R is, independently, H, a protecting group or C1-C12 alkyl. The substituent at the 2′ position can also be selected from allyl, amino, azido, thio, O-allyl, O—C1-C10 alkyl, OCF3, O(CH2)2SCH3, O(CH2)2-O—N(Rm)(Rn), and O—CH2-C(═O)—N(Rm)(Rn), wherein each Rm and Rn is, independently, H or substituted or unsubstituted C1-C10 alkyl. In some embodiments, a modified sugar is an unlocked sugar. An unlocked sugar refers to an acyclic sugar that has a 2′, 3′-seco acyclic structure, where the bond between the 2′ carbon and the 3′ carbon in a pentofuranosyl ring is absent.
In other embodiments, a modified nucleotide can contain a naturally occurring or a modified internucleoside linkage or phosphate backbone. An internucleoside linkage refers to the backbone linkage that connects the nucleosides. An internucleoside linkage may be a naturally-occurring internucleoside linkage (i.e., a phosphate linkage, also referred to as a 3′ to 5′ phosphodiester linkage, which is found in DNA and RNA) or a modified internucleoside linkage. A modified internucleoside linkage refers to an internucleoside linkage having at least one change that is structurally distinguishable from a naturally-occurring internucleoside linkage. Modified internucleoside linkages may help to improve the stability of a polynucleotide. Examples of modified internucleoside linkages include, but are not limited to, a phosphorothioate linkage, a phosphorodithioate linkage, a phosphoramidate linkage, a phosphorodiamidate linkage, a thiophosphoramidate linkage, a thiophosphorodiamidate linkage, a phosphoramidate morpholino linkage, and a thiophosphoramidate morpholino linkage, and a thiophosphorodiamidate morpholino linkage, which are known in the art and described in, e.g., Bennett and Swayze, Annu Rev Pharmacol Toxicol. 50:259-293, 2010. A phosphorothioate linkage is a 3′ to 5′ phosphodiester linkage that has a sulfur atom for a non-bridging oxygen in the phosphate backbone of an oligonucleotide. A phosphorodithioate linkage is a 3′ to 5′ phosphodiester linkage that has two sulfur atoms for non-bridging oxygens in the phosphate backbone of an oligonucleotide. A thiophosphoramidate linkage refers to a 3′ to 5′ phospho-linkage that has a sulfur atom for a non-bridging oxygen and a NH group as the 3′-bridging oxygen in the phosphate backbone of an oligonucleotide.
Computer programs can be used in the design of primers with the required specificity and optimal amplification properties, such as Oligo version 5.0 (National Biosciences). PCR methods can also be implemented.
In some embodiments, microarrays are used to measure the levels of biomarkers. An advantage of microarray analysis is that the expression of each of the biomarkers can be measured simultaneously, and microarrays can be specifically designed to provide a diagnostic expression profile for a particular disease or condition. Microarrays are prepared by selecting probes which comprise a polynucleotide sequence, and then immobilizing such probes to a solid support or surface. For example, the microarray may comprise a support or surface with an ordered array of binding (e.g., hybridization) sites or “probes” each representing one of the biomarkers described herein. Preferably the microarrays are addressable arrays, and more preferably positionally addressable arrays. More specifically, each probe of the array is preferably located at a known, predetermined position on the solid support such that the identity (i.e., the sequence) of each probe can be determined from its position in the array (i.e., on the support or surface). Each probe is preferably covalently attached to the solid support at a single site. Conditions for preparing microarrays, for hybridization conditions, and for detection of bound probes can be implemented.
In some embodiments, RNA sequencing (RNA-seq) can be used to measure the expression levels of biomarkers. RNA-seq is a technique based on enumeration of RNA transcripts using next-generation sequencing methodologies. In general, a population of RNA (total or fractionated, such as poly(A)+) is converted to a library of cDNA fragments with adaptors attached to one or both ends. Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing). The reads are typically 30-400 bp, depending on the DNA-sequencing technology used. Any high-throughput sequencing technology can be used for RNA-Seq, such as the Illumina IG, Applied Biosystems SOLiD, and Roche 454 Life Science systems. The Helicos Biosciences tSMS system has the added advantage of avoiding amplification of target cDNA. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genome-scale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.
As noted above, the “probe” to which a particular polynucleotide molecule specifically hybridizes contains a complementary polynucleotide sequence. The probes of the microarray typically consist of nucleotide sequences of, e.g., no more than 1,000 nucleotides, or of 10 to 1,000 nucleotides or 10-200, 10-30, 10-40, 20-50, 40-80, 50-150, or 80-120 nucleotides in length. The probes may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA. The polynucleotide sequences of the probes may also comprise DNA and/or RNA analogs, derivatives, or combinations thereof. For example, the probes can be modified at the base moiety, at the sugar moiety, or at the phosphate backbone (e.g., phosphorothioates). The polynucleotide sequences of the probes may be synthesized nucleotide sequences, such as synthetic oligonucleotide sequences. The probe sequences can be synthesized either enzymatically in vivo, enzymatically in vitro (e.g., by PCR), or non-enzymatically in vitro.
Probes are preferably selected using an algorithm that takes into account binding energies, base composition, sequence complexity, cross-hybridization potential based on probe similarities with other genes in the genome, and secondary structure. See Friend et al., International Patent Publication WO 01/05935, published Jan. 25, 2001; Hughes et al., Nat. Biotech. 19:342-7 (2001). An array will include both positive control probes, e.g., probes known to be complementary and hybridizable to sequences in the target polynucleotide molecules, and negative control probes, e.g., probes known to not be complementary and hybridizable to sequences in the target polynucleotide molecules. In addition, the present methods will include probes to both the biomarkers themselves, as well as to internal control sequences such as housekeeping genes, as described in more detail elsewhere herein.
In some embodiments, quantitative reverse transcriptase PCR (qRT-PCR) is used to determine the expression profiles of biomarkers (see, e.g., U.S. Patent Application Publication No. 2005/0048542A1; herein incorporated by reference in its entirety). The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. For example, extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, Calif., USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction.
In some embodiments, the PCR employs the Taq DNA polymerase, which has a 5-3′ nuclease activity but lacks a 3-5′ proofreading endonuclease activity. TAQMAN PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. In such methods, two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction, and a third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
TAQMAN RT-PCR can be performed using commercially available equipment, such as, for example, ABI PRISM 7700 sequence detection system. (Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), or Lightcycler (Roche Molecular Biochemicals, Mannheim, Germany). In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7700 sequence detection system. The system consists of a thermocycler, laser, charge-coupled device (CCD), camera and computer. The system includes software for running the instrument and for analyzing the data. 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. Fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs that can be used to normalize patterns of gene expression include mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and beta-actin.
In particular embodiments, the biomarker gene expression is determined using isothermal amplification. Isothermal amplification is a process in which a target nucleic acid is amplified using a constant, single, amplification temperature (e.g., from about 30° C. to about 95° C.). Unlike standard PCR, an isothermal amplification reaction does not include multiple cycles of denaturation, hybridization, and extension, of an annealed oligonucleotide to form a population of amplified target nucleic molecules (i.e., amplicons). There are various types of isothermal application known in the art, including but not limited to, loop-mediated isothermal amplification (LAMP), nucleic acid sequence based amplification NASBA, recombinase polymerase amplification (RPA), rolling circle amplification (RCA), nicking enzyme amplification reaction (NEAR), and helicase dependent amplification (HDA).
In particular embodiments, the isothermal amplification is real-time quantitative isothermal amplification, in which a target nucleic acid is amplified at a constant temperature and the target nucleic acid rate of amplification is monitored by fluorescence, turbidity, or similar measures (e.g., NEAR or LAMP). In some cases, RNA (e.g., mRNA) is isolated from a biological sample and is used as a template to synthesize cDNA by reverse-transcription. cDNA molecules are amplified under isothermal amplification conditions such that the production of amplified target nucleic acid can be detected and quantitated.
In particular embodiments, the isothermal amplification is Loop-Mediated Isothermal Amplification (LAMP). LAMP offers selectivity and employs a polymerase and a set of specially designed primers that recognize distinct sequences in the target nucleic acid (see, e.g., Nixon et al., (2014) Bimolecular Detection and Quantitation, 2:4-10; Schuler et al., (2016) Anal Methods., 8:2750-2755; and Schoepp et al., (2017) Sci. Transl. Med., 9:eaal3693). Unlike PCR, the target nucleic acid is amplified at a constant temperature (e.g., 60-65° C.) using multiple inner and outer primers and a polymerase having strand displacement activity. In some instances, an inner primer pair containing a nucleic acid sequence complementary to a portion of the sense and antisense strands of the target nucleic acid initiate LAMP. Following strand displacement synthesis by the inner primers, strand displacement synthesis primed by an outer primer pair can cause release of a single-stranded amplicon. The single-stranded amplicon may serve as a template for further synthesis primed by a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure. In subsequent LAMP cycling, one inner primer hybridizes to the loop on the product and initiates displacement and target nucleic acid synthesis, yielding the original stem-loop product and a new stem-loop product with a stem twice as long. Additionally, the 3′ terminus of an amplicon loop structure serves as initiation site for self-templating strand synthesis, yielding a hairpin-like amplicon that forms an additional loop structure to prime subsequent rounds of self-templated amplification. The amplification continues with accumulation of many copies of the target nucleic acid. The final products of the LAMP process are stem-loop nucleic acids with concatenated repeats of the target nucleic acid in cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of a target nucleic acid sequence in the same strand.
In some embodiments, the isothermal amplification assay comprises a digital reverse-transcription loop-mediated isothermal amplification (dRT-LAMP) reaction for quantifying the target nucleic acid (see, e.g., Khorosheva et al., (2016) Nucleic Acid Research, 44:2 e10). Typically, LAMP assays produce a detectable signal (e.g., fluorescence) during the amplification reaction. In some embodiments, fluorescence can be detected and quantified. Any suitable method for detecting and quantifying florescence can be used. In some instances, a device such as Applied Biosystem's QuantStudio can be used to detect and quantify fluorescence from the isothermal amplification assay.
Any suitable method for detecting amplification of a target nucleic acid in a test sample by quantitative real-time isothermal amplification may be used to practice the present methods. In some embodiments, quantitative real-time isothermal amplification of a target nucleic acid in a test sample is determined by detecting of one or more different (distinct) fluorescent labels attached to nucleotides or nucleotide analogs incorporated during isothermal amplification of the target nucleic acid (e.g., 5-FAM (522 nm), ROX (608 nm), FITC (518 nm) and Nile Red (628 nm). In another embodiment, quantitative real-time isothermal amplification of a target nucleic acid in a test sample can be determined by detection of a single fluorophore species (e.g., ROX (608 nm)) attached to nucleotides or nucleotide analogs incorporated during isothermal amplification of the target nucleic acid. In some embodiments, each fluorophore species used emits a fluorescent signal that is distinct from any other fluorophore species, such that each fluorophore can be readily detected among other fluorophore species present in the assay.
In some embodiments, methods of detecting amplification of a target nucleic acid in a test sample by quantitative real-time isothermal amplification can include using intercalating fluorescent dyes, such as SYTO dyes (SYTO 9 or SYTO 82). In some embodiments, methods of detecting amplification of a target nucleic acid in a test sample by quantitative real-time isothermal amplification can include using unlabeled primers to isothermally amplify the target nucleic acid in the test sample, and a labeled probe (e.g., having a fluorophore) to detect isothermal amplification of the target nucleic acid in the test sample. In some embodiments, unlabeled primers are used to isothermally amplify a target nucleic acid present in the test sample, and a probe is used having a 5-FAM dye label on the 5′ end and a minor groove binder (MGB) and non-fluorescent quencher on the 3′ end to detect isothermal amplification of the target nucleic acid (e.g., TaqMan Gene Expression Assays from ThermoFisher Scientific).
In some embodiments, detecting amplification of the target nucleic acid in the test sample is performed using a one-step, or two-step, quantitative real-time isothermal amplification assay. In a one-step quantitative real-time isothermal amplification assay, reverse transcription is combined with quantitative isothermal amplification to form a single quantitative real-time isothermal amplification assay. A one-step assay reduces the number of hands-on manipulations as well as the total time to process a test sample. A two-step assay comprises a first-step, where reverse transcription is performed, followed by a second-step, where quantitative isothermal amplification is performed. It is within the scope of the skilled artisan to determine whether a one-step or two-step assay should be performed.
In some embodiments, a composite biomarker value is calculated based on the Tt (time to threshold) values or a parameter that measures the rapidity of detected signal rise for each of the tested biomarkers. This may be accomplished by, e.g., establishing standard curves for the isothermal or other amplification of the target nucleic acid (e.g., biomarker) and the reference nucleic acid (e.g., housekeeping gene). The standard curves can be obtained by performing real-time isothermal amplification assays using quantitated calibrator samples with multiple known input concentrations. Appropriate methods are provided in, e.g., PCT Publication No. WO 2020/061217, the entire disclosure of which is herein incorporated by reference.
For example, in some embodiments, to generate a standard curve, quantitated calibrator samples are obtained by performing serial dilutions of a quantitated material. For example, a template is serially diluted in a buffer at 10-fold concentration intervals yielding templates covering a range of concentrations from, e.g., approximately 109 copies/μL to approximately 102 copies/μL. The precise concentration of each calibrator sample can be determined using methods known in the art.
To obtain a standard curve, a real-time amplification assay is performed for each aliquot with a known quantity (e.g., 1 μL) of a respective calibrator sample with a respective concentration of the target nucleic acid. In a real-time amplification assay for each respective calibrator sample, the intensity of the fluorescence emitted by intercalating fluorescent dyes (e.g., dsDNA dyes) or fluorescent labels for the target nucleic acid is measured as a function of time. For example, a plot can be generated of fluorescence intensity as a function of time in a real-time quantitative amplification assay. A dashed line can be used to represent a pre-determined threshold intensity, and the elapsed time from the moment when the amplification is started is the time-to-threshold Tt. A respective time-to-threshold value can be determined from each respective fluorescence curve as a function of time. Thus, time-to-threshold values Ttn, Ttn+1, Ttn+2, etc., are obtained for the different calibrator samples.
For exponential amplifications, the time-to-threshold is linearly proportional to the logarithm (e.g., logarithm to base 10) of the starting copy number (also referred to as template abundance). A scatter plot of data points can be generated from the fluorescence curves. Each data point represents a data pair [Log 10(CopyNumber), Tt](note that CopyNumber refers to starting number of copies of a nucleic acid in an amplification assay). In some embodiments, the data points fall approximately on a straight line. A linear regression is then performed on the data points in the plot to obtain the straight line that best fits the data points with the least amount of total deviations. The result of the linear regression is a straight line represented by the following equation,
where m is the slope of the line, and b is y-intercept. The slope m represents the efficiency of the isothermal amplification of the target nucleic acid; b represents a time-to-threshold as template copy number approaches zero. The straight line represented by Equation (1) is referred to as the standard curve.
In some embodiments, replicates (e.g., triplicates) of isothermal amplification assays may be run for each sample in order to gain a higher level of confidence in the data. Replicate time-to-threshold values can be averaged, and standard deviations can be calculated.
Once the standard curve is established for a given isothermal amplification assay, the standard curve can be used to convert a time-to-threshold value to a starting copy number for future runs of the amplification assay of unknown starting numbers of copies of the target nucleic acid, using the following equation,
Normally, the data points for low copy numbers or very high copy numbers may fall off of the straight line. The range of copy numbers within which the data points can be represented by the straight line is referred to as the dynamic range of the standard curve. The linear relationship between the time-to-threshold and the logarithmic of copy number represented by the standard curve would be valid only within the dynamic range.
If the amplification efficiencies for a target nucleic acid and a reference nucleic acid are different for a given isothermal amplification assay, it may be necessary to obtain separate standard curves for the target nucleic acid and the reference nucleic acid. Thus, two sets of real-time isothermal amplification assays may be performed, one set for establishing the standard curve for the target nucleic acid, the other set for establishing the standard curve for the reference nucleic acid. In cases where multiple target nucleic acids are considered (e.g., for a panel of seven biomarkers as described herein), a standard curve for each target nucleic acid may be obtained.
In some embodiments, the standard curves are generated prior to obtaining a test sample. That is, the standard curves are not generated on-board with the quantitative isothermal amplification of the test sample. Such standard curves may be referred to as off-board standard curves. Off-board standard curves may be used for estimating relative abundance values (i.e., expression levels). For example, for a test sample of unknown input concentration of a target nucleic acid, a first real-time amplification assay is performed for a first aliquot of the test sample to obtain a first time-to-threshold value with respect to the target nucleic acid. A second real-time isothermal amplification assay is then performed for a second aliquot of the test sample to obtain a second time-to-threshold value with respect to a reference nucleic acid. The first aliquot and the second aliquot contain substantially the same amount of the test sample. The first time-to-threshold value may then be converted into starting number of copies of the target nucleic acid using the standard curve of the target nucleic acid. Similarly, the second time-to-threshold value may be converted into starting number of copies of the reference nucleic acid using the standard curve of the reference nucleic. The starting number of copies of the target nucleic acid is then normalized against that of the reference nucleic acid to obtain a relative abundance value (i.e., expression level).
In cases where the amplification efficiencies for a target nucleic acid and a reference nucleic acid have approximately the same value that is known, relative abundance (i.e., expression level) may be obtained directly from time-to-threshold values without using standard curves.
To determine whether the subject is undergoing MI, a calculation is applied to the biomarker expression data from the subject to determine a composite biomarker value or biomarker score, that is indicative of the probability of the subject undergoing MI. In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the final biomarker gene signature. The composite biomarker values or biomarker scores can be scaled for comparison between datasets and used for receiver operating characteristic (ROC) curve and area under curve (AUC) as performance metrics of the selected biomarkers.
In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the biomarker gene signature comprising one or more biomarkers (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 95, 100, 200, 300, or 400 biomarkers) listed in Table 2.
In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the biomarker gene signature comprising one or more pairs of biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 pairs of biomarkers) listed in Table 3.
In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the biomarker gene signature comprising one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 biomarkers) selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3.
In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the biomarker gene signature comprising one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, or 8 biomarkers) selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65.
In some embodiments, the calculation for the composite biomarker value or biomarker score can be the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the biomarker gene signature comprising one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, or 7 biomarkers) selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3.
In some embodiments, the composite biomarker value or biomarker score can be calculated, e.g., by taking the sum, product, or quotient of the gene expression levels of the biomarkers, taken in terms of their absolute levels or their relative levels as compared to control genes, e.g., housekeeping genes, or by inputting them into a linear or nonlinear algorithm that incorporates at least the measured expression levels into an interpretable value.
In some embodiments, a threshold or cut-off value is suitably determined, and is optionally a predetermined value. In particular embodiments, the threshold value is predetermined in the sense that it is fixed, for example, based on previous experience with the assay and/or a population of subjects with a given outcome or outcomes, e.g., with a population of 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, or more subjects who underwent or who did not undergo MI. Alternatively, the predetermined value can also indicate that the method of arriving at the threshold is predetermined or fixed even if the particular value varies among assays or can even be determined for every assay run.
For the statistical analyses described herein, e.g., for the selection of biomarkers to be included in the calculation of a composite biomarker value or in the calculation of a probability or likelihood of a subject undergoing MI, as well as for diagnostic or therapeutic assessments made in view of a given composite biomarker value, other relevant information can also be considered, such as clinical data regarding the symptoms presented by each individual. This can include demographic information such as age, race, and sex; information regarding a presence, absence, degree, stage, severity or progression of a condition, phenotypic information, such as details of phenotypic traits, genetic or genetically regulated information, amino acid or nucleotide related genomics information, results of other tests including imaging, biochemical and hematological assays, other physiological scores, or the like.
As described above, the abundance values (i.e., expression levels) for the individual biomarker genes in the biological sample can be combined using a mathematical formula or a machine learning or other algorithm to produce a single composite biomarker value that can indicate the likelihood of a subject undergoing MI. In these embodiments, the produced value carries more predictive power than any individual gene level alone.
In some embodiments, types of algorithms for integrating multiple biomarkers into a single composite biomarker value or biomarker score may include, but are not limited to, a difference of geometric means, a difference of arithmetic means, a difference of sums, a simple sum, and the like. In some embodiments, a composite biomarker value may be estimated based on the relative abundance values (i.e., expression levels) of multiple biomarkers using machine-learning models, such as a regression model, a tree-based machine-learning model, a support vector machine (SVM) model, an artificial neural network (ANN) model, or the like.
Biomarker data may also be analyzed by a variety of methods to determine the statistical significance of differences in observed levels of biomarkers between test and reference expression profiles in order to evaluate the probability of a subject undergoing MI. In certain embodiments, patient data is analyzed by one or more methods including, but not limited to, multivariate linear discriminant analysis (LDA), receiver operating characteristic (ROC) analysis, principal component analysis (PCA), ensemble data mining methods, significance analysis of microarrays (SAM), cell specific significance analysis of microarrays (csSAM), spanning-tree progression analysis of density-normalized events (SPADE), and multi-dimensional protein identification technology (MUDPIT) analysis. (See, e.g., Hilbe (2009) Logistic Regression Models, Chapman & Hall/CRC Press; McLachlan (2004) Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience; Zweig et al. (1993) Clin. Chem. 39:561-577; Pepe (2003) The statistical evaluation of medical tests for classification and prediction, New York, N.Y.: Oxford; Sing et al. (2005) Bioinformatics 21:3940-3941; Tusher et al. (2001) Proc. Natl. Acad. Sci. U.S.A. 98:5116-5121; Oza (2006) Ensemble data mining, NASA Ames Research Center, Moffett Field, Calif., USA; English et al. (2009) J. Biomed. Inform. 42(2):287-295; Zhang (2007) Bioinformatics 8: 230; Shen-Orr et al. (2010) Journal of Immunology 184:144-130; Qiu et al. (2011) Nat. Biotechnol. 29(10):886-891; Ru et al. (2006) J. Chromatogr. A. 1111(2):166-174, Jolliffe Principal Component Analysis (Springer Series in Statistics, 2.sup.nd edition, Springer, N Y, 2002), Koren et al. (2004) IEEE Trans Vis Comput Graph 10:459-470; herein incorporated by reference in their entireties.)
It is not necessary that all of the biomarkers are elevated or depressed relative to control levels in a biological sample from a given subject to give rise to a determination on whether the subject is undergoing MI. For example, for a given biomarker level there can be some overlap between individuals falling into different probability categories. However, collectively the combined levels for all of the biomarker genes included in the assay will give rise to a value that, when compared to a threshold value, e.g., a threshold value derived from at least 50, 100, 150, 200, 250, 300, 350, 400, 500 or more subjects who have undergone MI, allows a determination concerning whether the subject is undergoing MI. For example, for a determination that a subject has a high likelihood of undergoing MI, the threshold value could be such that at across a population of at least 100 subjects who underwent MI and 100 subjects who did not undergo MI, at least 90% of the subjects who underwent MI are above the threshold. In another example, for a determination that a subject has a low likelihood of undergoing MI, the threshold value could be such that at across a population of at least 100 subjects who underwent MI and 100 subjects who did not undergo MI, at least 90% of the subjects who did not undergo MI are below the threshold. It will be appreciated that in any given assay there can be more than one threshold, e.g., a threshold in one direction that indicates that a subject is undergoing MI, and a threshold in the other direction that indicates that a subject is not undergoing MI.
As used herein, the terms “probability,” and “risk” with respect to a given outcome refer to conditional probability that subjects with a particular value actually have the condition (e.g., undergoing MI) based on a given mathematical model. An increased probability or risk for example can be relative or absolute and can be expressed qualitatively or quantitatively. For instance, an increased risk can be expressed as simply determining the subject's value and placing the test subject in an “increased risk” category, based upon previous population studies. Alternatively, a numerical expression of the test subject's increased risk can be determined based upon an analysis of the composite biomarker value.
In some embodiments, likelihood is assessed by comparing the level of a composite biomarker value or biomarker score to one or more preselected or threshold levels. Threshold values can be selected that provide an acceptable ability to predict the likelihood of a subject undergoing MI. In illustrative examples, receiver operating characteristic (ROC) curves are calculated by plotting the value of a composite biomarker value in two populations in which a first population has a first condition (e.g., undergoing MI) and a second population has a second condition (e.g., not undergoing MI).
For any particular biomarker, a distribution of biomarker levels for subjects with and without a disease will likely overlap, and some overlap will be present for composite biomarker values as well. Under such conditions, a test does not absolutely distinguish a first condition and a second condition with 100% accuracy, and the area of overlap indicates where the test cannot distinguish the first condition and the second condition. A threshold value is selected, above which (or below which, depending on how a composite biomarker value changes with a specified condition or prognosis) the test is considered to be “positive” and below which the test is considered to be “negative.” The area under the ROC curve (AUC) provides the C-statistic, which is a measure of the probability that the perceived measurement will allow correct identification of a condition (see, e.g., Hanley et al., Radiology 143: 29-36 (1982)).
In some embodiments, a positive likelihood ratio, negative likelihood ratio, odds ratio, and/or AUC or receiver operating characteristic (ROC) values are used as a measure of a method's ability to predict the likelihood of a subject undergoing MI. As used herein, the term “likelihood ratio” is the probability that a given test result would be observed in a subject with a condition or outcome of interest (e.g., undergoing MI) divided by the probability that that same result would be observed in a patient without the condition or outcome of interest (e.g., not undergoing MI). Thus, a positive likelihood ratio is the probability of a positive result observed in subjects with the specified condition or outcome divided by the probability of a positive results in subjects without the specified condition or outcome. A negative likelihood ratio is the probability of a negative result in subjects without the specified condition or outcome divided by the probability of a negative result in subjects with specified condition or outcome.
The term “odds ratio,” as used herein, refers to the ratio of the odds of an event occurring in one group (e.g., not undergoing MI) to the odds of it occurring in another group (e.g., undergoing MI), or to a data-based estimate of that ratio. The term “area under the curve” or “AUC” refers to the area under the curve of a receiver operating characteristic (ROC) curve, both of which are well known in the art. AUC measures are useful for evaluating the accuracy of a classifier across the complete decision threshold range. Classifiers with a greater AUC have a greater capacity to classify unknowns correctly between two or more groups of interest (e.g., subjects undergoing or not undergoing MI), or a low, intermediate, or high probability of undergoing MI). ROC curves are useful for plotting the performance of a particular feature (e.g., any of the biomarker expression levels or composite biomarker values described herein and/or any item of additional biomedical information) in distinguishing or discriminating between two populations (e.g., subjects undergoing or not undergoing MI). Typically, the feature data across the entire population (e.g., the cases and controls) are sorted in ascending order based on the value of a single feature. Then, for each value for that feature, the true positive and false positive rates for the data are calculated. The sensitivity is determined by counting the number of cases above the value for that feature and then dividing by the total number of cases. The specificity is determined by counting the number of controls below the value for that feature and then dividing by the total number of controls.
Although this refers to scenarios in which a feature is elevated in cases compared to controls, it also applies to scenarios in which a feature is lower in cases compared to the controls (in such a scenario, samples below the value for that feature would be counted). ROC curves can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features can be mathematically combined (e.g., added, subtracted, multiplied, etc.) to produce a single value, and this single value can be plotted in a ROC curve. Additionally, any combination of multiple features, in which the combination derives a single output value, can be plotted in a ROC curve. These combinations of features can comprise a test. The ROC curve is the plot of the sensitivity of a test against 1-specificity of the test, where sensitivity is traditionally presented on the vertical axis and 1-specificity is traditionally presented on the horizontal axis. Thus, “AUC ROC values” are equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
In some embodiments, at least two (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more) biomarker genes are selected to discriminate between subjects with a first condition or outcome and subjects with a second condition or outcome with at least about 70%, 75%, 80%, 85%, 90%, 95% accuracy or having a C-statistic of at least about 0.70, 0.75, 0.80, 0.85, 0.90, 0.95.
In the case of a positive likelihood ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “condition” and “control” groups (e.g., in subjects that are undergoing or not undergoing MI); a value greater than 1 indicates that a positive result is more likely in the condition group (e.g., in subjects undergoing MI); and a value less than 1 indicates that a positive result is more likely in the control group (e.g., in subjects not undergoing MI). In this context, “condition” is meant to refer to a group having one characteristic (e.g., undergoing MI) and “control” group lacking the same characteristic (e.g., not undergoing MI). In the case of a negative likelihood ratio, a value of 1 indicates that a negative result is equally likely among subjects in both the “condition” and “control” groups; a value greater than 1 indicates that a negative result is more likely in the “condition” group; and a value less than 1 indicates that a negative result is more likely in the “control” group.
In the case of an odds ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “condition” and “control” groups; a value greater than 1 indicates that a positive result is more likely in the “condition” group; and a value less than 1 indicates that a positive result is more likely in the “control” group. In the case of an AUC ROC value, this is computed by numerical integration of the ROC curve. The range of this value can be 0.5 to 1.0. A value of 0.5 indicates that a classifier (e.g., a biomarker level) cannot discriminate between cases and controls (e.g., undergoing MI vs not undergoing MI), while 1.0 indicates perfect diagnostic accuracy. In certain embodiments, biomarker gene levels and/or composite biomarker values are selected to exhibit a positive or negative likelihood ratio of at least about 1.5 or more or about 0.67 or less, at least about 2 or more or about 0.5 or less, at least about 5 or more or about 0.2 or less, at least about 10 or more or about 0.1 or less, or at least about 20 or more or about 0.05 or less.
In certain embodiments, the biomarker gene levels and/or composite biomarker values are selected to exhibit an odds ratio of at least about 2 or more or about 0.5 or less, at least about 3 or more or about 0.33 or less, at least about 4 or more or about 0.25 or less, at least about 5 or more or about 0.2 or less, or at least about 10 or more or about 0.1 or less. In certain embodiments, biomarker gene levels and/or composite biomarker values are selected to exhibit an AUC ROC value of greater than 0.5, preferably at least 0.6, more preferably 0.7, still more preferably at least 0.8, even more preferably at least 0.9, and most preferably at least 0.95.
In some cases, multiple thresholds can be determined in so-called “tertile,” “quartile,” or “quintile” analyses. In these methods, the “diseased” and “control groups” (or “high risk” and “low risk”) groups are considered together as a single population, and are divided into 3, 4, or 5 (or more) “bins” having equal numbers of individuals. The boundary between two of these “bins” can be considered “thresholds.” A risk (of a particular diagnosis or prognosis for example) can be assigned based on which “bin” a test subject falls into. In some embodiments of the present methods, subjects are assigned to one of three bins, i.e. “low”, “intermediate”, or “high”, referring to the probability of undergoing MI based on the composite biomarker value obtained using the present methods. For example, subjects can be classified according to the estimated probability of undergoing MI into 3 bins: low likelihood (bin 1), intermediate (bin 2), and high-likelihood (bin 3). The bins are defined, e.g., such that the likelihood ratios are <0.15 in bin 1, from 0.15 to 5 in bin 2, and >5 in bin 3.
The phrases “assessing the likelihood” and “determining the likelihood,” as used herein, refer to methods by which the skilled artisan can predict the presence or absence of a condition (e.g., undergoing MI) in a patient. The skilled artisan will understand that this phrase includes within its scope an increased probability that a condition is present or absent in a patient; that is, that a condition is more likely to be present or absent in a subject. For example, the probability that an individual identified as having a specified condition actually has the condition can be expressed as a “positive predictive value” or “PPV.” Positive predictive value can be calculated as the number of true positives divided by the sum of the true positives and false positives. PPV is determined by the characteristics of the predictive methods of the present methods as well as the prevalence of the condition in the population analyzed. The statistical algorithms can be selected such that the positive predictive value in a population having a condition prevalence is in the range of 70% to 99% and can be, for example, at least 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In other examples, the probability that an individual identified as not having a specified condition or outcome actually does not have that condition can be expressed as a “negative predictive value” or “NPV.” Negative predictive value can be calculated as the number of true negatives divided by the sum of the true negatives and false negatives. Negative predictive value is determined by the characteristics of the diagnostic or prognostic method, system, or code as well as the prevalence of the disease in the population analyzed. The statistical methods and models can be selected such that the negative predictive value in a population having a condition prevalence is in the range of about 70% to about 99% and can be, for example, at least about 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In some embodiments, a subject is determined to have a significant probability of having or not having a specified condition or outcome. By “significant probability” is meant that the subject has a reasonable probability (0.6, 0.7, 0.8, 0.9 or more) of having, or not having, a specified condition or outcome.
In some embodiments, the composite biomarker value is combined with one or more clinical parameters. For example, a formula is used to combine (i) either the individual gene expression values or the output from a classifier that uses the gene expression values, with (ii) the one or more clinical parameter-based values or data, to generate (iii) a new value that is useful to the clinician.
The methods described herein may be used to determine whether the subject is undergoing MI, or can otherwise be used to characterize a sample, in order to correlate outputs of models used for the characterization with a state of MI. The likelihood of a subject undergoing MI can be used to make decisions about further medical care. In some embodiments, a determination of a high probability of a subject undergoing MI can indicate that the subject is a candidate for additional cardiovascular diagnostic testings. Examples of additional cardiovascular testings can include, but are not limited to, angiography (e.g., non-invasive computed tomography (CT) angiography or invasive coronary angiography (ICA)), myocardial perfusion imaging, echocardiography, and magnetic resonance imaging.
In some embodiments, a determination of a high probability of a subject undergoing MI can indicate that the subject is a candidate for further therapeutic interventions. Examples of therapeutic interventions can include, but are not limited to, administration of a pharmaceutical compound, an interventional procedure, or both. Examples of pharmaceutical compounds that can be administered to a subject determined to have a high likelihood of undergoing MI include, but are not limited to, an anticoagulant, an antiplatelet, a beta-blocker, a nitrate, a statin, an angiotensin-converting-enzyme (ACE) inhibitor, and an angiotensin receptor blocker (ARBs). In some embodiments, a subject determined to have a high likelihood of undergoing MI can further receive an interventional procedure, such as revascularization (e.g., a percutaneous coronary intervention (PCI) or a coronary artery bypass graft (CABG)).
In some embodiments, a determination of a low probability of a subject undergoing MI can indicate that the subject is a candidate for further evaluation by a physician for other conditions. For example, a subject with a low probability for undergoing MI can be evaluated by a physician for conditions such as anxiety disorder, aortic dissection, aortic stenosis, asthma, esophagitis, heart failure, gastroenteritis, left ventricular embolism, musculoskeletal disorder, myocarditis, pericarditis, pneumonia, pulmonary embolism, and unstable angina.
In either case, the likelihood of a subject undergoing MI can be used to inform decision making, for example, regarding whether to admit the subject to the medical facility (e.g., a clinic or emergency department) or to release the subject. In some embodiments, regardless of the likelihood of a subject undergoing MI, the subject is likely to receive medical care for the conditions or symptoms that the subject presents, e.g., at the time of entering the clinic or emergency department, either in conjunction with further diagnostic testings or not. As used herein, “medical care” comprises any action taken with respect to the treatment of the subject, whether in an emergency room, urgent care context, another clinical facility or context, or at home, in order to alleviate, eliminate, slow the progression of, or in any way improve any aspect or symptom, including, but not limited to, administering a therapeutic drug, performing surgery, and assisting with symptom management.
Transcriptomic data of blood samples from individuals with MI and controls from risk population with CAD or stable/unstable angina were analyzed using multi-cohort analysis of 4 datasets that are clinically meaningful. This analysis allowed for the identification 480 mRNAs that can robustly distinguish MI samples from these control samples with a good accuracy suitable for the clinical setting. The subsets of the 480 identified mRNA biomarkers can be used in developing a new diagnostic test on an established assay system. Such a test, when deployed in clinics and ED, can enable clinicians to make better decisions and boost the current gold standard of triaging MI.
The gene expression omnibus (GEO) and ArrayExpress were surveyed for datasets with transcriptomic data (Homo Sapiens) from either whole blood or peripheral blood mononuclear cell (PBMCs) relevant to cardiovascular diseases, specifically ischemia. For inclusion criteria, datasets that had genome-wide blood-based transcriptomic data and closely mimicked the clinical settings were selected. All datasets were manually curated and 4 were identified. These datasets included both MI samples collected at admission, that can be used as cases, and coronary artery disease (CAD), stable angina (SA), or unstable angina (UA) samples, that can be used as controls against which MI would be discriminated. There were 8 additional datasets that either had MI but used healthy individuals as controls or had MI samples from a later or unknown time point. To mimic the clinical situations better, these 8 datasets were excluded. With the selected 4 datasets of clinical usefulness, there were 80 controls (CAD, SA, or UA) and 193 MI samples (Table 1). Note that in GSE59867, 39 redundant samples were identified empirically based on their transcriptomic profiles that were already included in GSE62646 (28 cases and 11 controls) and thus removed them from the larger set, GSE59867 (instead of original dataset with 111 cases and 46 controls, 83 cases and 35 controls without the confirmed redundant profiles were used). Note that MI samples were from either ST-elevated myocardial infarction (STEMI) or non-ST-elevated myocardial infarction (NSTEMI) events.
Multi-cohort analysis: The four transcriptomic datasets were downloaded from GEO together with their phenotypical data. A well-established multi-cohort analysis6 was performed on these 4 datasets using the Metalntegrator package (v2.1.1) in R. Briefly, effect size (ES) was calculated for each gene within a study between cases (MI samples) and controls (CAD/UA/SA samples) as Hedges' g. The pooled ES across all datasets was computed using DerSimonian & Laird random-effects model. After summarizing the effect size, p-values across all mRNAs were corrected for multiple testing based on Benjamini-Hochberg false discovery rate (FDR). Fisher's sum of logs method was used for combining p-values across studies. Log-sum of p-values that each gene is over- or under-regulated was computed along with corresponding p-values. Again, Benjamini-Hochberg method was used to correct for multiple testing across all mRNAs. Leave one study out (LOSO) analysis was performed by removing one dataset at a time in the discovery.
Identification of signature mRNAs: An effect size (ES) threshold of 0.6 or ≤−0.6 in conjunction with FDR s 0.05 were used to filter differentially expressed mRNAs in the multi-cohort analysis for over-expressed or under-expressed genes respectively. This threshold was empirically chosen to correspond to 80% power for moderate heterogeneity.
Definition of composite biomarker value or biomarker score for the classifier: A classifier value of a sample was evaluated as the geometric mean of the normalized, log 2-transformed expression of the up-regulated mRNA minus that of the down-regulated mRNA from the final gene signature. The values were scaled for comparison between datasets and used for receiver operating characteristic (ROC) curve and area under curve (AUC) as performance metrics of the selected biomarkers.
Down-selection of parsimonious mRNAs: Additionally, from the final biomarker signature, a smaller mRNA set with a similarly robust or optimized performance was identified using a greedy forward search algorithm. Briefly, starting with a list of gene signature, a score, ground truth, and a stopping threshold (0.1), the forward search computes the score for each gene individually and chooses the gene with the highest weighted AUROC across data sets. In subsequent iterations, each of the remaining genes are added to the model one at a time, whereby the gene which provides the greatest increase in weighted AUROC is retained. Once the iterative increase in weighted AUROC falls below the stopping threshold (i.e., the addition of any gene from the list no longer increases the total weighted AUROC by more than the threshold), the forward search terminates, resulting in the final gene list. We defined the weighted AUROC as the sum of each dataset's AUROC multiplied by its number of samples.
Selection of mRNA biomarker signature: Differential expression was assessed at an ES threshold of 0.6 (over-expressed) or ≤−0.6 (under-expressed) for MI group versus control group in conjunction with FDR s 0.05 for the multi-cohort analysis (Table 2). This threshold was empirically chosen as it corresponds to 80% power for moderate heterogeneity. At |≥ESI 0.6 and FDR ≤0.05, we identified 480 differentially expressed mRNAs in 2 or more out of the 4 studies among a total 18,271 genes covered. We decided to use the 480-mRNA list as our biomarker candidate base. Of these mRNAs, 337 were over-expressed and 143 were under-expressed in MI samples as compared to controls. The effect size and all other parameters are given in Table 2 for each of the identified 480 mRNAs.
Table 2. The list of 480 mRNA biomarkers that distinguish M from control samples. These mRNA have an absolute effect size 0.6 and FDR 0.05 and have been observed in ≥2 or more studies out of the 4 studies. Last column indicates the inclusion flag of the 15 parsimonious genes from forward search.
Performance of individual mRNA and two-mRNA combinations: First, AUCs for these ROCs were determined for each of all measured mRNAs (18,271 total) across the 4 studies in order to understand the background characteristics (
Table 3. Performance of top 2-mRNA combinations. We used pairwise gene combinations (114,960 combinations) to evaluate the performance. Top 39 combinations were with AUC 0.85 in all 4 datasets are listed here.
Performance of 480 mRNA biomarker set: The geometric mean value calculated based on the description in Methods for the 480 selected mRNAs were significantly higher for MI samples as compared to the control samples in all datasets (
A parsimonious set of mRNA signature: The greedy forward search using the 480 mRNAs set as input yielded an optimized set of 15 mRNAs which can be more amenable for translating to clinical use in a clinically deployable platform. The forward search mRNA signature has 10 over-expressed mRNAs and 5 under-expressed mRNAs. The geometric mean value based on the 15 mRNAs effectively distinguishes MI samples from control samples (
Performance summary of mRNA biomarker set: The classifier performance based on the 480 signature mRNAs and the 15 forward search mRNAs are summarized in
Specificity of mRNA biomarker set: Finally, ROCs and corresponding AUCs based on either the 480 mRNA signature (A) or the 15 mRNAs signature (B) with 10 repetitions of randomly permuted class labels showed that our mRNA signature is specific at distinguishing MI samples from controls (
Acute MI is the leading cause of death among ischemic heart diseases. An early and accurate diagnosis can provide timely treatment and reduce unwanted medical expenses. A blood mRNA-based test to effectively identify MI patients is a molecular test that could address this unmet clinical need.
Using a well-established multi-cohort analysis method, a multi-mRNA signature consisting of 480 mRNAs from four available datasets was discovered and its robust performance in discriminating MI from controls was demonstrated. Additionally, using a greedy forward search algorithm, a parsimonious set of 15 mRNAs with a similar and even better performance in distinguishing MI from controls was identified. Furthermore, it is illustrated that a reasonably good performance can be achieved with any 2-mRNA pair and their further combinations drawn from the list of 480 genes included in this report.
These findings support the prospect that an accurate diagnostic test can be developed for clinical use with subsets of the identified mRNA biomarkers from blood transcriptomic profile of patients with acute MI. Such a diagnostic test, when deployed on a rapid point-of-care platform with a turnaround time of 30 minutes or less in clinics, can enable clinicians to make accurate diagnosis of MI which is often challenging when troponin measurements and ECG are inconclusive and requires a series of sequential tests/procedures that are costly and time and resource consuming. The diagnostic test envisioned here can help clinicians to make better and timely triaging decisions, save patient lives, and reduce the healthcare cost.
In one aspect, kits are provided for the determination of the likelihood of a subject undergoing MI, wherein the kits can be used to detect the biomarkers described herein. For example, the kits can be used to detect any one or more of the biomarkers described herein, which are differentially expressed in biological (e.g., whole blood or components thereof) samples from subjects that are undergoing MI and from subjects that are not undergoing MI. The kit may include one or more agents for the detection of biomarkers, a container for holding a biological sample isolated from a human subject; and printed instructions for reacting agents with the biological sample or a portion of the biological sample to detect the presence or amount of at least one biomarker in the biological sample. The agents may be packaged in separate containers. The kit may further comprise one or more control reference samples and reagents for performing a PCR, isothermal amplification, immunoassay, NanoString, or microarray analysis, e.g., reference samples from control subjects. The kit may also comprise one or more devices or implements for carrying out any of the herein devices, e.g., 96-well plates, microfluidic cartridges, single-well multiplex assays, etc.
In certain embodiments, the kit comprises agents, e.g., primers and/or probes, for measuring the levels of one or more biomarkers (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 95, 100, 200, 300, or 400 biomarkers) listed in Table 2.
In certain embodiments, the kit comprises agents, e.g., primers and/or probes, for measuring the levels of one or more pairs of biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 pairs of biomarkers) listed in Table 3.
In certain embodiments, the kit comprises agents, e.g., primers and/or probes, for measuring the levels of one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 biomarkers) selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3.
In certain embodiments, the kit comprises agents, e.g., primers and/or probes, for measuring the levels of one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, or 8 biomarkers) selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65.
In certain embodiments, the kit comprises agents, e.g., primers and/or probes, for measuring the levels of one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, or 7 biomarkers) selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3.
In certain embodiments, the kit comprises a microarray or other solid support for analysis of a plurality of biomarker polynucleotides. An exemplary microarray or other support included in the kit comprises one or more oligonucleotides that hybridize to one or more biomarkers (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 95, 100, 200, 300, or 400 biomarkers) listed in Table 2. An exemplary microarray or other support included in the kit comprises one or more oligonucleotides that hybridize to one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 biomarkers) selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3. An exemplary microarray or other support included in the kit comprises one or more oligonucleotides that hybridize to one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, 7, or 8 biomarkers) selected from the group consisting of SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65. An exemplary microarray or other support included in the kit comprises one or more oligonucleotides that hybridize to one or more biomarkers (e.g., 1, 2, 3, 4, 5, 6, or 7 biomarkers) selected from the group consisting of OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3.
The kit can be designed for use with a specific detection system or technique, such as polymerase chain reaction (PCR) (e.g., quantitative PCR (qPCR), droplet digital PCR (ddPCR), reverse transcription PCR (RT-PCR), quantitative RT-PCR (qRT-PCR)), isothermal amplification (e.g., loop-mediated isothermal amplification (LAMP), reverse transcription LAMP (RT-LAMP), quantitative RT-LAMP (qRT-LAMP)), RPA amplification, ligase chain reaction, branched DNA amplification, nucleic acid sequence-based amplification (NASBA), strand displacement assay (SDA), transcription-mediated amplification, rolling circle amplification (RCA), helicase-dependent amplification (HDA), single primer isothermal amplification (SPIA), nicking and extension amplification reaction (NEAR), transcription mediated assay (TMA), CRISPR-Cas detection, or direct hybridization without amplification onto a functionalized surface (e.g., using a graphene biosensor). In particular embodiments, the kit can be designed for use with qRT-PCR or qRT-LAMP. The kit can contain additional materials needed for the specific detection system or technique.
The kit can comprise one or more containers for compositions contained in the kit. Compositions can be in liquid form or can be lyophilized. Suitable containers for the compositions include, for example, bottles, vials, syringes, and test tubes. Containers can be formed from a variety of materials, including glass or plastic. The kit can also comprise a package insert containing written instructions for methods of determining the composite biomarker value.
In one aspect, a measurement system is provided. Such systems allow, e.g., the detection of biomarker gene expression in a sample and the recording of the data resulting from the detection. The stored data can then be analyzed as described elsewhere herein to determine the composite biomarker value of a subject. Such systems can comprise assay systems (e.g., comprising an assay device and detector), which can transmit data to a logic system (such as a computer or other system or device for capturing, transforming, analyzing, or otherwise processing data from the detector). The logic system can have any one or more of multiple functions, including controlling elements of the overall system such as the assay system, sending data or other information to a storage device or external memory, and/or issuing commands to a treatment device.
An exemplary measurement system is shown in
Certain aspects of the herein-described methods may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments are directed to computer systems configured to perform the steps of methods described herein, potentially with different components performing a respective step or a respective group of steps. The computer systems of the present disclosure can be part of a measuring system as described above, or can be independent of any measuring systems. In some embodiments, the present disclosure provides a computer system that calculates a composite biomarker value based on inputted biomarker expression (and optionally other) data, and determines the likelihood of a subject undergoing MI.
An exemplary computer system is shown in
In one aspect, the present disclosure provides a computer implemented method for determining the likelihood or probability of a subject undergoing MI. The computer performs steps comprising, e.g., receiving inputted patient data comprising values for the levels of one or more biomarkers in a biological sample from the patient; analyzing the levels of one or more biomarkers and optionally comparing them to respective reference values, e.g., to a housekeeping reference gene for normalization; calculating a composite biomarker value for the patient based on the levels of the biomarkers and comparing the value to one or more threshold values to assign the patient to a category (i.e., undergoing MI or not undergoing MI); and displaying information regarding the likelihood or probability of MI. In certain embodiments, the inputted patient data comprises values for the levels of a plurality of biomarkers in a biological sample from the patient, e.g., biomarkers comprising one or more biomarkers listed in Table 2 or one or more of biomarkers selected from the group consisting of SOCS3, OVCA2, MEF2A, PRR33, SCX, AQP9, IPPK, TNFSF8, CD163, MEF2B, HINT2, ATP1B1, ZCCHC18, CXorf65, and IMP3 (e.g., SOCS3, SCX, AQP9, IPPK, CD163, ATP1B1, ZCCHC18, and CXorf65; or e.g., OVCA2, MEF2A, PRR33, TNFSF8, MEF2B, HINT2, and IMP3).
In a further aspect, a diagnostic system is included for performing the computer implemented method, as described. A diagnostic system may include a computer containing a processor, a storage component (i.e., memory), a display component, and other components typically present in general purpose computers. The storage component stores information accessible by the processor, including instructions that may be executed by the processor and data that may be retrieved, manipulated or stored by the processor.
The storage component includes instructions for determining the MI status (i.e., undergoing MI or not undergoing MI) of the subject. For example, the storage component includes instructions for calculating the composite biomarker value for the subject based on biomarker expression levels, as described herein. In addition, the storage component may further comprise instructions for performing multivariate linear discriminant analysis (LDA), receiver operating characteristic (ROC) analysis, principal component analysis (PCA), ensemble data mining methods, cell specific significance analysis of microarrays (csSAM), or multi-dimensional protein identification technology (MUDPIT) analysis. The computer processor is coupled to the storage component and configured to execute the instructions stored in the storage component in order to receive patient data and analyze patient data according to one or more algorithms. The display component displays information regarding the diagnosis of the patient. The storage component may be of any type capable of storing information accessible by the processor, such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, USB Flash drive, write-capable, and read-only memories.
The instructions may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor. In that regard, the terms “instructions,” “steps” and “programs” may be used interchangeably herein. The instructions may be stored in object code form for direct processing by the processor, or in any other computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
Data may be retrieved, stored or modified by the processor in accordance with the instructions. For instance, although the diagnostic system is not limited by any particular data structure, the data may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files. The data may also be formatted in any computer-readable format such as, but not limited to, binary values, ASCII or Unicode. Moreover, the data may comprise any information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information which is used by a function to calculate the relevant data. In certain embodiments, the processor and storage component may comprise multiple processors and storage components that may or may not be stored within the same physical housing. For example, some of the instructions and data may be stored on removable CD-ROM and others within a read-only computer chip. Some or all of the instructions and data may be stored in a location physically remote from, yet still accessible by, the processor. Similarly, the processor may actually comprise a collection of processors which may or may not operate in parallel. In one aspect, computer is a server communicating with one or more client computers. Each client computer may be configured similarly to the server, with a processor, storage component and instructions. Although the client computers and may comprise a full-sized personal computer, many aspects of the system and method are particularly advantageous when used in connection with mobile devices capable of wirelessly exchanging data with a server over a network such as the Internet.
The above description of example embodiments of the present disclosure has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form described, and many modifications and variations are possible in light of the teaching above.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the disclosure. However, other embodiments of the disclosure may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary. Reference to a “first” component does not necessarily require that a second component be provided. Moreover, reference to a “first” or a “second” component does not limit the referenced component to a particular location unless expressly stated. The term “based on” is intended to mean “based at least in part on.”
All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art. Where a conflict exists between the instant application and a reference provided herein, the instant application shall dominate.
When a group of substituents is disclosed herein, it is understood that all individual members of those groups and all subgroups and classes that can be formed using the substituents are disclosed separately. When a Markush group or other grouping is used herein, all individual members of the group and all combinations and subcombinations possible of the group are intended to be individually included in the disclosure. As used herein, “and/or” means that one, all, or any combination of items in a list separated by “and/or” are included in the list; for example “1, 2 and/or 3” is equivalent to “‘1’ or ‘2’ or ‘3’ or ‘1 and 2’ or ‘1 and 3’ or ‘2 and 3’ or ‘1, 2 and 3’”. Whenever a range is given in the specification, for example, a temperature range, a time range, or a composition range, all intermediate ranges and subranges, as well as all individual values included in the ranges given are intended to be included in the disclosure.
The FIGURES illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products according to preferred embodiments, example configurations, and variations thereof. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block can occur out of the order noted in the FIGURES. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As a person skilled in the art will recognize from the previous detailed description and from the figures and claims, modifications and changes can be made to the preferred embodiments of the invention without departing from the scope of this invention defined in the following claims.
This application is a U.S. national stage filing under 35 U.S.C. § 371 of International Application No. PCT/US2023/014600, filed on Mar. 6, 2023, which claims priority to U.S. Provisional Application No. 63/325,447, filed on Mar. 30, 2022, the disclosure of each of which is hereby incorporated by reference in its entirety for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/014600 | 3/6/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63325447 | Mar 2022 | US |