The present invention relates to a disease-specific metabolite profile, and particularly to a biomarker composition obtained by screening from blood plasma-specific metabolite profiles of coronary heart disease subjects. The present invention also relates to a use of the biomarker compositions in risk assessment, diagnosis, early diagnosis, or pathological staging of coronary heart disease, and to a method for risk assessment, diagnosis, early diagnosis, or pathological staging of coronary heart disease.
Coronary artery heart disease (CAHD), also known as ischemic heart disease, or coronary heart disease for short, is one of the most common heart diseases, referring to dysfunctions and/or organic pathologic changes of cardiac muscles caused by coronary artery stenosis or insufficient blood supply, thus it is also called as ischemic heart disease (IHD). In 2012, it is the first cause of death in the world[1], and one of the major reasons for hospitalization[2]. Coronary heart disease may occur at any age, even in children, but the major age of onset is middle age, and its incidence increases with age. Nearly 17 million people die from atherosclerotic heart diseases every year in the world, and it is estimated that there is an increase of 50% in deaths by 2020, reaching 25 million per year, accounting for ⅓ of deaths in the world. In China, there are 2.5 million people die from cardiovascular diseases per year; the new myocardial infarctions occur in 500,000 people per year; the occurrence of coronary heart disease has significant regional differences, that is, it is generally higher in the northern cities than the southern cities; there are also significant gender differences, that is, the ratio of men to women is 2˜5:1. The data show that there are also similar differences in distribution of coronary heart disease in patients in the world[3]. At present, the diagnosis of coronary heart disease still lacks a uniform standard, and the existing diagnostic methods such as electrocardiogram, electrocardiogram stress test, dynamic electrocardiogram, radionuclide myocardial imaging, echocardiography, hematological examination, coronary CT, coronary angiography and intravascular imaging techniques all have some shortcomings. For example, the observation of symptoms, echocardiography and so on have strong subjectivity, the coronary CT, coronary angiography and intravascular imaging techniques are invasive diagnosis which cause additional pains in patients. The diagnosis using the single markers that have been found in blood has disadvantages such as poor sensitivity and specificity, and high false positive rate. It is of great significance to develop a noninvasive, specific and accurate method for the diagnosis of coronary heart disease[4,5].
Metabolomics is a systematic biology discipline developed after genomics and proteomics to study the species, quantities and variations of endogenous metabolites in a subject after affections of internal or external factors. Metabolomics is to analyze the whole metabolic profile of an organism, and to explore the corresponding relationships between metabolites and physiological and pathological changes, so as to provide a basis for the diagnosis of diseases. Therefore, it is of great significance to screen metabolic markers associated with coronary heart disease, in particular to use a combination of multiple metabolic markers, for the metabolomics research, clinical diagnosis and treatment of coronary heart disease.
Aiming at the shortcomings such as trauma and invasion of the existing diagnostic methods for coronary artery diseases, the problem to be solved by the present invention is to provide a biomarker combination (i.e., a biomarker composition) that can be used for the diagnosis and risk assessment of coronary heart disease, and a method for diagnosis and risk assessment of coronary heart disease.
In the present invention, liquid chromatography-mass spectrometry is used for analyzing the metabolite profiles of blood plasma samples of the coronary heart disease group and the control group, and pattern recognition is used for analyzing and comparing the metabolite profiles of the coronary heart disease group and the control group, so as to determine specific liquid chromatography-mass spectrometry data and corresponding specific biomarkers, which provide a basis for the subsequent theoretical research and clinical diagnosis.
The first aspect of the present invention relates to a biomarker composition, comprising at least one or more selected from the following Biomarkers 1 to 6:
Biomarker 1, which has a mass-to-charge ratio of 310.04±0.4 amu, and a retention time of 611.25±60 s;
Biomarker 2, which has a mass-to-charge ratio of 311.05±0.4 amu, and a retention time of 611.26±60 s;
Biomarker 3, which has a mass-to-charge ratio of 220.00±0.4 amu, and a retention time of 122.77±60 s;
Biomarker 4, which has a mass-to-charge ratio of 247.09±0.4 amu, and a retention time of 146.37±60 s;
Biomarker 5, which has a mass-to-charge ratio of 255.03±0.4 amu, and a retention time of 117.92±60 s; and
Biomarker 6, which has a mass-to-charge ratio of 170.03±0.4 amu, and a retention time of 202.18±60 s;
for example, comprising 1, 2, 3, 4, 5 or 6 of these biomarkers.
In one embodiment of the present invention, the characteristics of the above six biomarkers are shown in Table 1.
In one embodiment of the present invention, the biomarker composition comprises at least Biomarkers 1 to 3 and 6; optionally, further comprises Biomarker 4 and/or Biomarker 5.
In one embodiment of the present invention, the biomarker composition comprises Biomarkers 1 to 6.
In one embodiment of the present invention, the biomarker composition comprises Biomarkers 3 to 6.
The second aspect of the present invention relates to a reagent composition, comprising a reagent for detecting the biomarker composition according to the first aspect of the present invention.
In the present invention, the reagent for detecting the biomarker is, for example, a ligand such as an antibody that can bind to the biomarker; optionally, the reagent for detection may also have a detectable label. The reagent composition is a combination of all detection reagents.
The third aspect of the present invention relates to a use of the biomarker composition according to the first aspect and/or the reagent composition according to the second aspect of the present invention in manufacture of a kit, in which the kit is used for risk assessment, diagnosis, early diagnosis or pathological staging of coronary heart disease.
In an embodiment of the present invention, the kit further comprises training set data for the contents of the biomarker composition according to the first aspect of the present invention in a coronary heart disease subject and a normal subject.
In one embodiment of the present invention, the training set data are shown in Table 2.
The present invention also relates to a method for risk assessment, diagnosis, early diagnosis or pathological staging of coronary heart disease, comprising a step of determining content of each biomarker of the biomarker composition according to the first aspect of the present invention in a sample (e.g., blood plasma, whole blood) of a subject.
In one embodiment of the present invention, a liquid chromatography-mass spectrometry method is used for determining the content of each biomarker of the biomarker composition according to the first aspect of the present invention in the sample (e.g., blood plasma, whole blood) of the subject.
In one embodiment of the present invention, the method further comprises a step of establishing a training set for contents of the biomarker composition according to the first aspect of the present invention in samples (e.g., blood plasma, whole blood) of a coronary heart disease subject and a normal subject (control group).
In one embodiment of the present invention, the training set is established by using a multivariate statistical classification model (e.g., a random forest model).
In one embodiment of the present invention, the training set comprises data as shown in Table 2.
In one embodiment of the present invention, the method further comprises a step of comparing the content of each biomarker of the biomarker composition according to the first aspect of the present invention in the sample (e.g., blood plasma, whole blood) of the subject to the data of training set of the biomarker compositions of the coronary heart disease subject and the normal subject.
In one embodiment of the present invention, the training set is established by using a multivariate statistical classification model (e.g., a random forest model).
In one embodiment of the present invention, the training set comprises data as shown in Table 2.
In one embodiment of the present invention, the step of comparing the content of each biomarker is carried out by using a receiver operating characteristic curve (ROC).
In one embodiment of the present invention, the result is interpreted by a method comprising: if a subject is assumed to be a non-coronary heart disease subject, and his probability of non-coronary heart disease diagnosed by ROC is less than 0.5 or his probability of coronary heart disease diagnosed by ROC is greater than 0.5, the subject is determined to have a high probability or a higher risk of coronary heart disease, or is diagnosed as a patent with coronary heart disease.
In a particular embodiment of the present invention, the method comprises the steps of:
1) determining the content of each biomarker of the biomarker composition according to the first aspect of the present invention in blood plasma of a subject by means of liquid chromatography-mass spectrometry;
2) determining the content of the biomarker composition according to the first aspect of the present invention in blood plasma of a coronary heart disease subject and a normal subject by means of liquid chromatography-mass spectrometry, and establishing a training set (for example, as shown in Table 2) for the content of the biomarker composition by using a random forest model;
3) comparing the content of each biomarker of the biomarker composition according to the first aspect of the present invention in blood plasma of the subject to the data of the training set of the biomarker composition of the coronary heart disease subject and the normal subject by using ROC curves;
4) if a subject is assumed to be a non-coronary heart disease subject, and his probability of non-coronary heart disease diagnosed by ROC is less than 0.5 or his probability of coronary heart disease diagnosed by ROC is greater than 0.5, the subject is determined to have a high probability or a higher risk of coronary heart disease, or is diagnosed as a patent with coronary heart disease.
The present invention also relates to the biomarker composition according to the first aspect of the present invention, which is used in risk assessment, diagnosis, early diagnosis or pathological staging of coronary heart disease.
In one embodiment of the present invention, a liquid chromatography-mass spectrometry method is used for determining the content of each biomarker of the biomarker composition according to the first aspect of the present invention in the sample (e.g., blood plasma, whole blood) of the subject.
In one embodiment of the present invention, it further comprises a step of establishing a training set for content of each biomarker of the biomarker composition according to the first aspect of the present invention of a coronary heart disease subject and a normal subject.
In one embodiment of the present invention, the training set is established by using a multivariate statistical classification model (e.g., a random forest model).
In one embodiment of the present invention, the training set comprises data as shown in Table 2.
In one embodiment of the present invention, it further comprises a step of comparing the content of each biomarker of the biomarker composition according to the first aspect of the present invention in the sample (e.g., blood plasma, whole blood) of the subject to the data of training set for the biomarker composition of the coronary heart disease subject and the normal subject.
In one embodiment of the present invention, the training set is established by using a multivariate statistical classification model (e.g., a random forest model).
In one embodiment of the present invention, the training set comprises data as shown in Table 2.
In one embodiment of the present invention, the comparing is a method using a receiver operating characteristic curve for comparison.
In one embodiment of the present invention, the result is interpreted by a method comprising: if a subject is assumed to be a non-coronary heart disease subject, and his probability of non-coronary heart disease diagnosed by ROC is less than 0.5 or his probability of coronary heart disease diagnosed by ROC is greater than 0.5, the subject is determined to have a high probability or a higher risk of coronary heart disease, or is diagnosed as a patent with coronary heart disease.
In an embodiment of the invention, the content of each biomarker in the biomarker composition and the data of content of each biomarker in the training set are obtained by the following steps:
(1) collection and treatment of samples: a blood plasma sample is collected from a clinical patient or a model animal;
the sample is subjected to process, such as liquid-liquid extraction using an organic solvent, wherein the organic solvent includes, but is not limited to, ethyl acetate, chloroform, diethyl ether, n-butanol, petroleum ether, dichloromethane, acetonitrile, etc.; or protein precipitation, wherein the protein precipitation comprising precipitation of adding an organic solvent (such as methanol, ethanol, acetone, acetonitrile, isopropyl alcohol), various acid, alkali or salt precipitation, heating precipitation, filtration/ultrafiltration, solid-phase extraction, centrifugation, in single or comprehensive manner;
the sample is dried or not dried, and then dissolved in an organic solvent (e.g., methanol, acetonitrile, isopropanol, chloroform, etc., preferably methanol, acetonitrile) or water (in single or combination, with or without salt);
and then the sample is not derivatized or derivatized with a reagent (e.g., trimethylsilane, ethyl chloroformate, N-methyltrimethylsilyl trifluoroacetamide, etc.).
(2) liquid chromatography-mass spectrometry (HPLC-MS): a metabolite profile of blood plasma is obtained by liquid chromatography and mass spectrometry, the metabolite profile is processed to obtain data of each peak such as peak height or peak area (peak intensity), mass-to-charge ratio and retention time, in which the peak area represents biomarker content.
In a particular embodiment of the present invention, the treatment in step (1) comprises the following step: the sample is subjected to liquid-liquid extraction with an organic solvent; or to protein precipitation; the sample is dried or not dried, and then dissolved in single or combination of organic solvents or water, the water is free of salt or contains a salt, and the salt comprises sodium chloride, phosphate, carbonate and the like; the sample is not derivatized or derivatized with a reagent.
In a specific embodiment of the present invention, in the liquid-liquid extraction with organic solvent in step (1), the organic solvent includes, but is not limited to, ethyl acetate, chloroform, diethyl ether, n-butanol, petroleum ether, dichloromethane, acetonitrile.
In a particular embodiment of the invention, the protein precipitation in step (1) comprises, but is not limited to, precipitation of adding an organic solvent, or various acid, alkali or salt precipitation, heating precipitation, filtration/ultrafiltration, solid phase extraction, centrifugation in single or combination manner, in which the organic solvent comprises methanol, ethanol, acetone, acetonitrile, isopropanol.
In a specific embodiment of the present invention, step (1) preferably comprises performing the treatment by using a protein precipitation method, preferably a protein precipitation using ethanol.
In a specific embodiment of the present invention, in step (1), the sample is dried or not dried, and then dissolved in an organic solvent or water; the organic solvent includes methanol, acetonitrile, isopropanol, chloroform, preferably methanol, or acetonitrile.
In a specific embodiment of the present invention, in step (1), the sample is derivatized with a reagent, the reagent comprises trimethylsilane, ethyl chloroformate, N-methyltrimethylsilyl trifluoroacetamide.
In a specific embodiment of the present invention, in step (2), the metabolite profile is processed to obtain raw data, the raw data are preferably data of peak height or peak area, as well as mass number and retention time of each peak.
In a specific embodiment of the present invention, in step (2), the raw data are subjected to peak detection and peak matching, the peak detection and the peak matching are preferably performed by using XCMS software.
The mass spectrometry types are roughly divided into four types including ion trap, quadrupole, electrostatic field orbital ion trap, and time-of-flight mass spectrometries, and the mass deviations of these four types are 0.2 amu, 0.4 amu, 3 ppm and 5 ppm, respectively. The experimental results in the present invention are obtained by ion trap analysis, and therefore suitable for all mass spectrometric instruments using ion trap and quadrupole as mass analyzers, including Thermo Fisher's LTQ Orbitrap Velos, Fusion, Elite et al., Waters' TQS, TQD, etc., AB Sciex 5500, 4500, 6500, etc., Agilent's 6100, 6490, Bruker's amaZon speed ETD and so on.
In an embodiment of the present invention, the content of biomarker is expressed by peak area (peak intensity) of mass spectrum.
In the present invention, the mass-to-charge ratio and the retention time have the meanings in the art.
It is well known to those skilled in the art that the atomic mass unit and retention time of each biomarker of the biomarker composition of the present invention will fluctuate within certain ranges when different liquid chromatography-mass spectrometry devices and different detection methods are employed; wherein the atomic mass unit may fluctuate within a range of ±0.4 amu, for example ±0.2 amu, for example ±0.1 amu, and the retention time may flucturate within a range off 60 s, for example ±45 s, for example ±30 s, for example ±15 s.
In the present invention, the methods of using the random forest model and the ROC curves are well known in the art (see the references [7] and [8]), and those skilled in the art can set and adjust parameters according to specific situations.
In the present invention, the training set and test set have the meanings well known in the art. In an embodiment of the invention, the training set refers to a data set of contents for biomarkers in samples of coronary heart disease subjects and normal subjects having given numbers. The test set is a set of data used to test the performance of the training set.
In the present invention, a training set of biomarkers of coronary heart disease subjects and normal subjects is constructed, and the content values of biomarkers of test samples are evaluated using the training set as basis.
In an embodiment of the present invention, the training set comprises data as shown in Table 2.
In the present invention, the subject may be a human or a model animal.
In the present invention, the unit of mass-to-charge ratio is amu, amu refers to atomic mass unit, also known as Dalton (Da, D), which is a unit used to measure atomic or molecular mass, and is defined as 1/12 of atomic mass of C-12.
In the present invention, one or more of the biomarkers may be used for risk assessment, diagnosis or pathological staging, etc., of coronary heart disease, preferably at least four of them, i.e., Biomarkers 1 to 3 and Biomarker 6, are used for evaluation, or all of the six biomarkers (i.e., Biomarkers 1 to 6) are used for evaluation, so as to obtain desired sensitivity and specificity.
Those skilled in the art would understand that when sample size is further expanded, the normal content value interval (absolute value) of each biomarker in the sample can be obtained using sample detection and calculation methods known in the art. In this way, when the content of the biomarker is detected by methods other than mass spectrometry (for example, by using an antibody and an ELISA method), the absolute value of the detected biomarker content can be compared with the normal content value, optionally, risk assessment, diagnosis or pathological staging, etc., of coronary heart disease can also be achieved in combintion with statistical methods.
Without being bound by any theory, the inventors have pointed out that these biomarkers are endogenous compounds present in human body. The metabolite profile of blood plasma of a subject is analyzed by the method of the present invention, and the mass value and the retention time in the metabolite profile indicate the presence and the corresponding position of the corresponding biomarker in the metabolite profile. At the same time, the biomarkers of coronary heart disease population exhibit certain content ranges in their metabolite profiles.
Endogenous small molecules in body are the basis of life activities, and changes of disease states and body functions will inevitably lead to changes of metabolism of the endogenous small molecules in the body. The present invention shows that there are significant differences in blood plasma metabolite profiles between the coronary heart disease group and the control group. In the present invention, a plurality of relevant biomarkers are obtained through comparison and analysis of metabolite profiles of the coronary heart disease group and the control group, which can be used in combintion with high quality data of metabolite profiles of biomarkers of coronary heart disease population and normal population as the training set to accurately perform risk assessment, early diagnosis and pathological staging of coronary heart disease. Compared with the commonly used diagnostic methods, this method has advantages of noninvasion, convenience and rapid, and has high sensitivity and good specificity.
While the embodiments of the present invention will be described in detail with reference to the following examples, it will be understood by those skilled in the art that the following examples are intended to be illustrative of the invention and are not to be taken as limiting the scope of the invention. In the examples, when specific conditions are not given, conventional conditions or conditions recommended by the manufacturer are employed. The used reagents or instruments which manufacturers are not given are all conventional products commercially available in the markets.
The blood plasma samples of coronary heart disease and normal subjects in the present invention are from the Guangdong General Hospital.
1.1 Collection of samples: morning blood samples of volunteers were collected, immediately placed and stored in −80° C. low temperature refrigerator. A total of 52 blood samples were collected from the normal group and 40 blood samples were collected from the coronary heart disease group.
1.2 Treatment of samples: frozen samples were thawed at room temperature, 500 μL of each blood plasma sample was taken and placed in 2.0 mL centrifuge tube, added with 1000 μL of methanol for dilution, centrifuged at 10000 rpm for 5 min, for standby.
1.3 Analysis by Liquid Chromatography-Mass Spectrometry
Instrument and Equipment
HPLC-MS-LTQ Orbitrap Discovery (Thermo, Germany)
Chromatographic Conditions
Column: C18 column (150 mm×2.1 mm, 5 μm); Solvent A was 0.1% (v/v) formic acid/water, and solvent B was 0.1% (v/v) formic acid/methanol; gradient elution program: 0˜3 min, 5% B, 3˜36 min, 5%˜80% B, 36˜40 min, 80%˜100% B, 40˜45 min, 100% B, 45˜50 min, 100%˜5% B, 50˜60 min, 5% B; flow rate: 0.2 mL/min; injection volume: 20 μL.
Mass Spectrometry Conditions
ESI ion source, positive ion mode for data acquisition, the mass scanning range was 50˜1000 mass-to-charge (m/z). Ion source parameters ESI: sheath gas was 10, auxiliary air was 5, capillary temperature was 350° C., spray voltage was 4.5 KV.
1.4 Data Processing
XCMS software (e.g., http://metlin.scripps.edu/xcms/) was used for peak detection and peak matching of raw data; and R software using PLS-DA (partial least squares-discriminant analysis) was used for pattern recognition analysis of differential variables of the metabolite profile of coronary heart disease group (
1.5 Comparison and Determination of Characteristic Metabolite Profiles
The blood plasma metabolite profile of coronary heart disease patients (
2.1 Sample collection: morning blood plasma samples of volunteers were collected, immediately placed and stored in −80° C. low temperature refrigerator. A total of 52 blood plasma samples were collected from the normal group and 40 blood plasma samples were collected from the coronary heart disease group.
2.2 Sample treatment: frozen samples were thawed at room temperature, 500 μL of each blood plasma sample was taken and placed in 2.0 mL centrifuge tube, added with 1000 μL of methanol for dilution, centrifuged at 10000 rpm for 5 min, for standby.
2.3 Analysis by Liquid Chromatography-Mass Spectrometry
Instrument and Equipment
HPLC-MS-LTQ Orbitrap Discovery (Thermo, Germany)
Chromatographic Conditions
Column: C18 column (150 mm×2.1 mm, 5 μm); mobile phase A: 0.1% formic acid aqueous solution, mobile phase B: 0.1% formic acid in acetonitrile solution; gradient elution program: 0˜3 min, 5% B, 3˜36 min, 5%˜80% B, 36˜40 min, 80%˜100% B, 40˜45 min, 100% B, 45˜50 min, 100% 5% B, 50˜60 min, 5% B; flow rate: 0.2 mL/min; injection volume: 20 μL.
Mass Spectrometry Conditions
ESI ion source, positive ion mode for data acquisition, scanning mass m/z 50˜1000. Ion source parameters ESI: sheath gas was 10, auxiliary air was 5, capillary temperature was 350° C., cone hole voltage was 4.5 KV.
2.4 Data Processing
XCMS software was used for relevant pretreatment of raw data to obtain a two-dimensional matrix data, and wilcox-test was used to statistically determine significant differences of peaks of metabolites; and PLS-DA (partial least squares-discriminant analysis) was used for pattern recognition analysis of differential variables of the metabolite profile of coronary heart disease group (
2.5 Metabolic Profile Analysis and Potential Biomarkers
2.5.1 Orthogonal Partial Least Squares Discriminant Analysis (PLS-DA)
PLS-DA method was used to distinguish the normal group and the coronary heart disease group, and potential markers were further screened by VIP values (Loading-plot for principal component analysis) (
2.5.2 Potential Biomarkers
The potential markers were screened according to the VIP values of the PLS-DA model for pattern cognition. The variables with VIP values greater than 1 were extracted from the PLS-DA model, and variables with large deviation and relevance were further selected according to Loading-plot, Volcano-plot and S-plot, and 6 potential biomarkers were obtained by further combining variables with P value of less than 0.05 and Q value of less than 0.05, which were shown in Table 1.
2.5.3 Principal Component Analysis (PCA)
PCA is a non-supervised pattern recognition method that can visually describe differences between samples in multidimensional space. PCA analysis was performed on 83 samples of the obese group and control group using the resultant six differential markers. It can be seen from
2.5.4 Receiver Operating Characteristic Curve (ROC)
The six potential markers were discriminated in the normal group and the coronary heart disease group by using a random forest model (Random Forest)[7] and receiver operating characteristic curve (ROC)[8]. The data of peak areas of 92 metabolite profiles of the normal group and the coronary heart disease group were selected and used as training set via ROC modeling (see references [7] and [8]) (Table 2). In addition, 83 test samples (including 38 coronary heart disease samples and 45 normal control samples) were selected as test set. The test results showed AUC=1, FN (false negative)=0, FP (false positive)=0 (
Using the random forest model to calculate the classification ability of the six potential biomarkers for the obese group and the normal group, the results of the classification ability (arranged from high to low) were shown in Table 3. The markers in the table should be tested using at least above 4 markers (
If mass-to-charge ratios, such as 310.04 and 311.05, were randomly removed from the training set, the resultant ROC test set (the above 83 test set samples) had AUC=0.8851, AUC decreased significantly, FN=0.184 and FP=0.200, FN and FP significantly increased (
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2014/087853 | 9/30/2014 | WO | 00 |