This invention relates generally to the field of nuclear magnetic resonance (NMR) based metabolic phenotyping and, more specifically, to the use of such a method for identifying and characterizing SARS-CoV-2 infections, inflammatory conditions and cardiovascular risks.
The COVID-19 disease pandemic resulting from SARS-CoV-2 infection has, so far, resulted in over 300 million cases and more than five million deaths worldwide. The range of clinical expression of COVID-19 is extreme, varying from asymptomatic or mild to severe respiratory distress and multiple organ damage, with or without respiratory involvement. There is an unmet need for accurate diagnosis and prediction of disease severity at an early stage so that individual infections can be monitored and managed effectively. There is also an unmet medical need for new functional markers of patient recovery in COVID-19, especially for the complex systemic complications of the disease and to monitor changes in risk level during and after the acute phase.
A metabolic phenoconversion approach was recently proposed to explore the systemic shifts in plasma biochemistry resulting from SARS-CoV-2 infection and the accompanying multi-system pathological disruptions caused by the virus. Phenoconversion for COVID-19 is associated with a range of metabolic biomarkers (lipoproteins, glycoproteins, amino acids, lipids and other metabolites) that can be derived from NMR spectroscopic and mass spectrometric data. Indeed, combining NMR and mass spectrometry (MS)-generated metabolic features into an integrated supervised classification model allowed excellent discrimination between SARS-CoV-2 positive subjects and controls. This approach also enabled deep insights to be gained into the systemic nature of the COVID-19 disease, with its distinctive embedded biomarker features, including those previously observed in diabetes, cardiovascular disease, liver dysfunction, neurological disruption and acute inflammation.
Proton NMR spectroscopy has been shown to be highly effective in detecting disease signatures in biofluids such as blood plasma, and multiple NMR methods have been applied to extract latent biomarker information either using physical NMR experiments including two dimensional methods or statistical spectroscopic methods such as Statistical Total Correlation SpectroscopY (STOCSY) and related techniques. Although physical procedures can be used to extract, separate and augment detection and identification of metabolites and lipoproteins in plasma, one of the key advantages of NMR spectroscopy is its non-invasive and non-destructive nature which enables the interrogation of molecular interactions complexation and physical dynamics of complex mixtures that can carry extra diagnostic information. This also allows the sample to be retained for further experimentation and elucidation of additional diagnostic information.
Plasma glycoproteins are biosynthesised and released mainly from the liver; they are enzymatically glycosylated and assist solubilization of multiple hydrophobic compounds in the blood. It has been reported that the well-resolved N-acetyl signals from glycosylated amino sugar residues in acute phase reactive proteins such as α-1 N-acetyl-glycoprotein in NMR spectra of blood plasma are elevated in multiple inflammatory states, including obesity, diabetes, cardiovascular disease, rheumatoid arthritis and systemic immune-pathological conditions such as HIV infection and systemic lupus erythematosus. These NMR signals are now widely described as GlycA and GlycB.
The GlycA signal (δ 2.03) is a composite of signals from primarily five proteins: α-1-acid glycoprotein, α-1-antichymotrypsin, α-1-antitrypsin, haptoglobin and transferrin. In a-1-acid glycoprotein the signal originates from five N-linked oligosaccharide chains on a backbone of 183 amino acid residues and is present at approximately 20 μM in healthy individuals. The α-1-acid glycoprotein has the strongest correlation with the GlycA signal and is thought to account for most of the signal, although inter-individual differences in the levels of these five glycoproteins have been reported. Multiple biological functions have been ascribed to α-1-acid glycoprotein, including modulating immunological function via a macrophage-released inhibitory factor that acts to prevent IL-1 activation of thymocyte proliferation, stimulating lymphocyte proliferation, serving as drug transporters and inhibiting platelet aggregation. Acute phase inflammation has been associated with two-to five-fold increases in plasma GlycA signals. The GlycB acetyl signal (δ 2.07) arising from glycoprotein N-acetylneuraminidino-groups have also been observed to increase in various inflammatory conditions such as diabetes and obesity. Both GlycA and GlycB have been shown to correlate with C-Reactive Protein (CRP) levels in plasma and it has been suggested that GlycA and GlycB may be superior biomarkers of systemic inflammation over CRP, the main clinical chemistry marker of inflammation. It has also been recently reported that GlycA and GlycB are significantly elevated in COVID-19 patients and are strong markers of disease positivity.
Low-density lipoprotein cholesterol (LDL), high-density lipoprotein cholesterol (HDL), apolipoprotein B100 (ApoB) and apolipoprotein A1 (ApoA1) have been associated with cardiovascular risks. In particular, the ApoB/AboA1 ratio has been shown to predict cardiovascular events. As each non-HDL particle carries one ApoB, non-HDL and non HDL-C have been reported to correlate with ApoB and may also be used as a marker for cardiovascular diseases. Although both parameters are correlated, their concordance is in discussion.
The non-destructive nature of NMR allows the study of complex supramolecular structures in the natural state in multiphasic samples such as blood plasma. The proton T2 relaxation properties allow for differential spectral editing, for instance, to remove broad macromolecular envelopes in blood or plasma based on their short proton T2's. Translational diffusion can also be measured using pulsed field gradients and used to selectively attenuate signals from small molecules that have fast translational motion. Then, mathematical transformations allow the construction of 2D spectra as in Diffusion Ordered SpectroscopY (DOSY), which is now the commonly applied method in most fields. It is also possible to combine motional-editing in two-dimensional experiments such as Diffusion-Edited Total Correlation SpectroscopY (DE-TOCSY), or both types of motional editing together including Diffusion and Relaxation Editing (DIRE). It has previously been shown that DIRE spectra enhance signals from molecules with slow translational diffusion but with high segmental motional freedom, and these requirements are satisfied by plasma glycoproteins and molecules constrained within certain lipoprotein sub-compartments.
In accordance with the present invention, Nuclear Magnetic Resonance (NMR) spectroscopy based metabolic phenotyping of plasma is employed to reveal novel diagnostic molecular signatures of inflammation and/or other medical conditions, such as SARS-CoV-2 infection. The NMR methods may use a Dlffusional and Relaxation Editing (DIRE) pulse sequence, with or without additional relaxation delay, diffusion or scalar couplings editing, such as J-coupling editing (JEDI). Other pulse sequences may also be used, such as a Pulsed Gradient Spin Echo (PGSE) sequence, a Pulsed Gradient Double Echo (PGDE) sequence, or a Pulsed Gradient Spin Echo×5 (PGSE-5).
The NMR analysis can be done using plasma or serum samples from patients, and the features that are measured show clear differences between those patients who are comparatively healthy and those have the condition in question, such as SARS-CoV-2 RT-PCR positive respiratory patients. In particular, the NMR spectra produced show unique biomarker signal combinations and patterns conferred by differential concentrations of metabolites with selected molecular mobility properties. These include: a) composite N-acetyl (—NCOCH3) signals from α-1-acid glycoprotein and other glycoproteins (GlycA and GlycB) that are elevated in SARS-CoV-2 positive patients (p=2.52×10−10 and 1.25×10−9 versus controls respectively); and b) newly-identified Supramolecular Phospholipid Composite signals from the —+N—(CH3)3 choline headgroups that are associated with HDL and LDL subfractions. In one embodiment, two such signals are considered: SPC-A, which corresponds to phospholipids in the HDL subfraction; and SPC-B, which corresponds to a phospholipid component of LDL. In another embodiment, the SPC signals correspond to three different regions of SPC peaks: SPC3, which correlates with LDL (a signal range of δ 3.26-3.30 ppm); SPC2, which correlates with HDL (a signal range of δ 3.235-3.26 ppm); and SPC1, which correlates with H4PL (the subfraction 4 of HDL, i.e., the higher density fraction) (a signal range of δ 3.20-3.235).
The overall SPC signal is equal to the sum of the signals in the subdivided SPC regions. Thus, SPCtotal=SPC-A+SPC-B in one embodiment, and SPCtotal=SPC1+SPC2+SPC3 in another embodiment. As a whole, SPC appears reduced in SARS-CoV-2 positive patients relative to both controls (p=1.40×10−7) and SARS-CoV-2 negative patients (p=4.52×10−8), but is not significantly different between controls and SARS CoV-2 negative patients. SPC/GlycA ratios are also significantly different for normal vs SARS-CoV-2 positive patients (p=1.23×10−10) and for SARS-CoV-2 negatives versus positives (p=1.60×10−9). By using SPCtotal and SPCtotal/GlycA as sensitive new molecular markers for diagnosing certain conditions, such as SARS-CoV-2 positivity, the invention augments current COVID-19 diagnostics and may be employed in functional assessment of the disease recovery process.
The collection of the biomarkers described above may be done in accordance with an exemplary embodiment of the invention as discussed below. Although the description is with regard to SARS CoV-2 infection, those skilled in the art will understand that this represents only an example of the conditions for which such biomarkers may be collected. The method is equally applicable to cardiovascular risk, inflammatory states, or other similar acute or chronic conditions. Moreover, those skilled in the art will understand that the invention is not limited to this particular sequence of steps, and that different variations may exist for collecting the relevant data.
Sample preparation—Blood is collected from SARS CoV-2 positive and matched SARS CoV-2 negative subjects using standard phlebotomy methods. The collection tube is incubated and spun at a temperature and speed that meets known guidelines. The plasma or serum is removed from the blood collection tubes and centrifuged to obtain a supernatant, to which is added an appropriate buffer. It is then transferred to an NMR tube, which is placed in the NMR instrument for analysis.
NMR verification check—prior to analysis the NMR spectrometer should be calibrated using calibration samples, such as a temperature calibration sample, a sucrose sample and a Quantref sample or a similar method that ensures reproducible quantitative data. A spectrum of the plasma/serum sample is then acquired using the desired pulse sequence, such as a DIRE sequence. Although 1D NMR spectra are usually quite complicated with thousands of peaks, the DIRE NMR experiment edits out many of these peaks leaving only those from a flexible domain of macromolecules such as proteins and large phospholipid complexes. The NMR measurement is then carried out on the blood plasma or serum sample and the obtained NMR signal intensities are used for data analysis.
Data analysis—Prior to statistical analysis, all spectra are pre-processed. In the exemplary embodiment, the residual water resonances are removed (δ 4.5-5.0), as are the chemical shift regions where no signals of interest are observed (δ<0.25 and δ>9.5). The anomeric glucose peak is set to δ5.23 ppm. In this document, all the chemical shifts were reported after calibration to Glucose. Optionally, the spectra are then baseline corrected and calibrated, although doing so is not mandatory.
The statistical analysis is performed using a multivariate statistical procedure such as O-PLS-DA, which produces a scores plot in which each point corresponds to a sample and gives a maximum separation between samples of the different classes. The model also provides loadings values which correspond to the various NMR intensities, which are examined to determine which NMR spectral features are responsible for the separation seen in the scores' plot. This operation is done using standard statistics to ensure that the derived features are statistically valid. The NMR features responsible for class separation are then examined.
An initial visual inspection of the scores plot is done to determine, if possible, specific molecules that are responsible for the features. More specialized NMR experiments are thereafter used, as necessary, to provide information on the class separating molecules (the biomarkers), which leads to identification on a molecular basis. These include statistical approaches such as STOCSY, the use of appropriate databases, and a range of 1- and 2-dimensional NMR spectra. The derived classifying NMR features can be tested using spectra from further samples (a test set) to check the validity of the class prediction. This test set can be a sub-set of the original spectral set or new spectra.
In order to get the intensity of the markers SPCtotal (δ 3.20-3.30; Glucose 5.23), GlycA (δ2.03; Glucose 5.23) and GlycB (δ2.07; Glucose 5.23), the points for each of these regions and for each sample are summed and used to calculate the certain informative ratios, such as SPCtotal/GlycA. If the intensity values or ratios are below (or above) a given threshold, it can be deduced that the patient has a condition, especially an inflammatory or risk signature, such as a SARS-CoV-2 positivity.
In the context of this description, NMR signal intensity shall refer to the peak maximum height or, more preferably, the peak integrals which correspond to the area under the peak, which is typically more accurate as it remains constant if the peak shape varies, e.g., due to relaxation differences.
The addition of J-coupling editing (JEDI) to the DIRE experiment improves the accuracy of GlycA and GlycB measurements, using the same protocol for sample preparation and data processing, as it removes perturbations from lipoproteins and lipids in the vicinity of the GlycA peak. The correlation between GlycA and GlycB values obtained with JEDI and similar sequences (that include J-coupling editing) is much higher than the ones obtained using DIRE only, as the interference from the lipoproteins is efficiently suppressed by JEDI.
SPC is associated with both HDL and LDL fractions, primarily. More specifically, different regions of the SPC peaks can be defined (identified herein as SPC1, SPC2 and SPC3) which are found to correlate with LDL (SPC3 δ 3.26-3.30 ppm), HDL (SPC2 δ 3.235-3.26 ppm) and H4PL (SPC1 δ 3.20-3.235), and could be associated by linear regression. As the concentrations of LDL and HDL are related to the concentrations of apoprotein B-100 and A1, respectively, the SPC peak can be used as a marker of cardiovascular risk. The downfield region (left hand side) of the SPC peak stems mainly from LDL contributions, the center region of the peak is associated with the HDL fraction, while the highfield region (right hand side) is connected to the H4 subfraction.
In addition to the direct diagnostic value of the biomarker signals and ratios, a significant association is observed between the SPC/GlycA ratio and BMI (Body Mass Index) intervals. This evidences the potential of the aforementioned SPC biomarkers in a wider range of applications than those discussed herein, as well and its complementarity with the GlycA biomarker for use in other diagnostics.
The present invention relates also to the use of Glyc and/or SPC (or any subfraction of them) as biomarkers for the diagnosis of SARS-CoV-2 and/or for the assessment of a functional recovery from a SARS-CoV-2 infection.
Furthermore, the present invention relates to a system for extracting related medical information referring to a SARS-CoV-2 condition or risk or inflammatory condition by proton NMR, the system comprising: an NMR spectrometer for acquiring at least one NMR spectrum of an in vitro blood plasma or serum sample; and a processor in communication with the NMR spectrometer. The processor is configured to obtain concentration measurements of the biomarkers Glyc and SPC in said blood plasma or serum sample which are referred to NMR signals of Glyc (δ 2.00-2.20 ppm) and the choline head group (+N—(CH3)3) signal at δ 3.20-3.30 ppm from phospholipids (=SPC signal), and (ii) calculate an inflammatory condition based on the obtained signal intensities and their ratios.
The invention thus provides 1H-NMR spectroscopic molecular markers and the use thereof for identifying one or more medical risk signatures in an in vitro blood plasma or serum sample from a patient by 1H NMR spectroscopy. In general, the markers comprise a combination of NMR intensity signals having magnitudes that are significantly different from known corresponding NMR intensity levels for a healthy patient, including a Glyc signal from at least one N-acetyl (—NCOCH3) glycoprotein and an SPC signal from a choline head group (+N—(CH3)3) of a supramolecular phospholipids cluster (SPC) present in HDL and LDL lipoprotein subfractions. In the exemplary embodiment, the Glyc signal is in a chemical shift region from δ=2.00 ppm to δ=2.20 ppm and the SPC signal is in a chemical shift region from δ=3.20 ppm to δ=3.30 ppm. The Glyc signal itself may be subdivided into a signal GlycA in a chemical shift subregion of δ=2.00 ppm to δ=2.09 ppm and a signal GlycB in a chemical shift subregion of δ=2.09 ppm to δ=2.2 ppm. Similarly, SPC signal may be subdivided into a signal SPC, in a chemical shift subregion from δ=3.2 ppm to δ=3.235 ppm, a signal SPC2 in a chemical shift subregion from δ=3.235 ppm to 3.26 ppm, and a signal SPC3 in a chemical shift subregion from δ=3.26 ppm to δ=3.3 ppm.
The molecular markers may also include various ratios of the Glyc and SPC signal or portions thereof, such as a ratio of NMR peak intensities of Glyc or either of GlycA or GlycB to NMR peak intensities of SPC or one or more of SPC1, SPC2 or SPC3. For example, one useful ratio is SPCtotal/GlycA, where SPCtotal=SPC1+SPC2+SPC3. The markers are highly useful, for example, for diagnosing medical risk signatures that include a SARS-CoV-2 infection, acute inflammation, certain cardiovascular risk conditions, as well as for assessment of a functional recovery from a SARS-CoV-2 infection. To use the markers for diagnosing such a risk signature, a blood plasma or serum sample is obtained from a patient, and an NMR measurement of the sample is performed to obtain a spectrum of NMR intensities. The magnitudes of a combination of the Glyc and SPC NMR intensity signals are then determined, and the presence of the risk signature is diagnosed when the magnitudes of the glycoprotein NMR intensities and the SPC NMR intensities, or the ratios derived therefrom, are significantly different from known corresponding NMR intensity levels or ratios for a healthy patient, such as by being beyond a predetermined threshold.
A system for identifying the medical risk signatures is also provided by the invention, and includes an NMR spectrometer for acquiring at least one 1H NMR spectrum of the vitro blood plasma or serum sample, and a data processor in communication with the NMR spectrometer. The data processor is configured to obtain concentration measurements of the NMR intensity signals of one or more of the markers, which would indicate the presence of the disease condition when their magnitudes are significantly different from known corresponding NMR intensity levels for a healthy patient. As discussed above, the markers include a Glyc signal from at least one N-acetyl (—NCOCH3) glycoprotein and an SPC signal from a choline head group (+N—(CH3)3) of a supramolecular phospholipids cluster (SPC) present in HDL and LDL lipoprotein subfractions.
The biomarkers and diagnostic methods using them are demonstrated herein, in part, by describing a research study conducted by the inventors that confirmed their effectiveness, in particular with regard to the diagnosis of SARS-CoV-2 infection. The following discussion of this study provides an exemplary embodiment for implementation of the invention, but those skilled in the art will recognize that the principles applied therein are extendable to other specific applications, which are likewise considered to be within the scope of the present invention.
Patient Enrolment and Sample Collection for Western Australian Cohort: Blood plasma samples were collected into potassium EDTA sample tubes from a cohort of adult individuals in a study initiated at the Fiona Stanley Hospital in the Western Australia South Metropolitan Health Service catchment as part of the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC)/World Health Organisation (WHO) pandemic trail framework (SMHS Research Governance Office PRN: 3976 and Murdoch University Ethics no. 2020/052). Healthy control participants were enrolled as volunteers, provided study details and written informed consent was obtained prior to data collection in accordance with ethical governance (Murdoch University Ethics no. 2020/053). Five groups of participants were recruited from the Fiona Stanley and Royal Perth Hospitals: i) patients who presented COVID-19 disease symptoms and subsequently tested positive for SARS-CoV-2 infection from upper and/or lower respiratory tract swabs by RT-PCR (n=17 patients, sampled at various times resulting in n=58 plasma specimens); ii) healthy controls who had not exhibited COVID-19 disease symptoms (n=26 participants); iii) individuals with respiratory disease symptoms and who tested negative for SARS-CoV-2 and were non-hospitalized (n=23 participants); iv) hospitalized SARS-CoV-2 negative respiratory patients (n=11); and v) individuals who were serologically IgA positive for COVID-19 (n=6). Serological testing for SARS-CoV-2 antibodies was performed at the PathWest clinical testing laboratories, Western Australia using 10 μL of plasma in a commercial point-of-care serological COVID-19 IgA/IgG test. Samples were considered as SARS-CoV-2 positive if IgA>1.0 or equivocal where IgA=0.8-1.0. Demographic data together with the clinical symptoms are shown below in Tables 1-3. IgA and IgG levels are reported in Table 4. Plasma samples were stored at −80° C. until required for analysis.
Patient Enrolment and Sample Collection for Autonomous Community of the Basque Country (Spain) cohort: The cohort consisted of i) patients who tested positive for SARS-CoV-2 infection from upper and/or lower respiratory tract swabs by RT-PCR (n=36) and ii) healthy control participants (n=80). All serum samples were collected by the Basque Biobank for research (BIOEF). Healthy serum samples were collected before the COVID-19 pandemic from the active population while the COVID-19 samples were collected at the Cruces University Hospital (Barakaldo, Spain) from patients who presented compatible symptoms, confirmed by a RT-PCR assay on nasal swab samples. All participants provided informed consent to clinical investigations, according to the Declaration of Helsinki, and all data were anonymized to protect their confidentiality. The sample handling protocol was evaluated and approved by the Comité de Ética de Investigación con medicamentos de Euskadi (CEIm-E, PI+CES-BIOEF 2020-04 and PI219130). Shipment of human samples to ANPC had the approval of the Ministry of Health of the Spanish Government. Samples were stored at −80° C.
Patient Enrolment and Sample Collection for Microbiome Understanding in Maternity Study (MUMS) cohort: The cohort consisted of women (n=99) who were recruited in their first trimester of pregnancy and followed through at seven time points: trimesters; one, two and three, the time of birth and then six weeks, six months and 12 months postpartum. All serum samples were collected at the University of New South Wales (UNSW), Microbiome Research Centre (MRC), Sydney, Australia. All participants provided informed consent to clinical investigations, according to the Declaration of Helsinki, and all data were anonymized to protect their confidentiality. The sample handling protocol was evaluated and approved by the South Eastern Sydney Local Health District Research Ethics Committee (17/293 (HREC/17/POWH/605)). Shipment of human samples to ANPC had the approval of the University of New South Wales. Samples were stored at −80° C.
Sample Processing: Blood samples were centrifuged at 13,000 g to separate the plasma, which was then frozen at −80° C. until use. Plasma samples were thawed at 20° C. for 30 minutes then centrifuged at 13,000 g for 10 minutes at 4° C. Plasma samples were prepared in 5 mm outer diameter SampleJet™ NMR tubes, following the recommended procedures for in vitro analytical and diagnostics procedures using 300 μL of plasma mixed with 300 μL of phosphate buffer (75 mM Na2HPO4, 2 mM NaN3, 4.6 mM sodium trimethylsilyl propionate-[2,2,3,3-2H4] (TSP) in 80% D2O, pH 7.4±0.1). NMR SampleJet™ tubes were sealed with POM balls added to the caps and stored in 96 well plates. All processing procedures were compliant with previous recommendations on sample handling and storage for COVID-19 samples.
Quantification of plasma lipoproteins—A total of 112 lipoprotein parameters were quantified based on 1D NMR experiments as part of Bruker's IVDr experiment suite for blood plasma (Table 5). This approach is termed Bruker's IVDr lipoprotein class and subclass Analysis (B.I.-LISA™) and lipid analytes include cholesterol, free cholesterol, phospholipids, triglycerides, apolipoproteins A1/A2/B100 and ratio B100/A1, in total plasma concentration and resolved for main lipoprotein classes and subclasses. Main classes of plasma-lipoproteins were defined as: high-density lipoprotein (HDL, density 1.063-1.210 kg/L), intermediate-density lipoprotein (IDL, density 1.006-1.019 kg/L) low-density lipoprotein (LDL, density 1.09-1.63 kg/L) and very low-density lipoprotein (VLDL, 0.950-1.006 kg/L). The main lipoprotein classes HDL, LDL, VLDL were further divided into different lipoprotein sub-classes: (LDL-1:1019-1.031 kg/L, LDL-2: 1.031-1.034 kg/L, LDL-3: 1.034-1.037 kg/L, LDL-4: 1.037-1.040 kg/L, LDL-5: 1.040-1.044 kg/L, LDL-6: 1.044-1.063 kg/L), and the HDL sub-fractions into four density classes (HDL-1 1.063-1.100 kg/L, HDL-2 1.100-1.125 kg/L, HDL-3 1.125-1.175 kg/L, and HDL-4 1.175-1.210 kg/L), the VLDL sub-fractions divided into 5 density classes.
600 MHz Proton NMR Spectroscopy and In Vitro Diagnostic Experiments: NMR spectroscopic analyses were performed on a 600 MHz Bruker BioSpin Corp. Avance III HD spectrometer equipped with a 5 mm BBI probe and fitted with the Bruker SampleJet™ robot cooling system set to 5° C. A full quantitative calibration was completed prior to the analysis using a previously described protocol. A series of NMR experiments were performed, comprising Bruker's In Vitro Diagnostics research (IVDr) methods set, including: i) a standard 1D experiment with solvent pre-saturation; ii) a Carr-Purcell-Meiboom-Gill (CPMG) spin-echo experiment; and iii) a 2-Dimensional J-resolved experiment. The total experiment time was 12.5 minutes per sample. Data was processed in automation mode using Bruker Topspin™ 3.6.2 and ICON™ NMR to achieve phasing, baseline correction and calibration to TSP. Further regression experiments were performed to quantify 112 parameters of main plasma lipoprotein classes and subclasses (Bruker IVDr Lipoprotein Subclass Analysis, B.I.-LISA™) based on a PLS-regression model using the —(CH2)n (δ 1.25) and —CH3 (δ 0.80) signal.
Diffusion and Relaxation Editing NMR Experiments: A Diffusion Relaxation Editing (DIRE) approach was applied to further investigate its diagnostic potential for assessment of SARS-CoV-2 positivity.
The DIRE sequence used in the exemplary embodiment is shown in
The other new sequences represent alternative examples that may be used with the invention. In the sequences shown in
In
Identification of Key Molecular Species in DIRE Spectral Signatures: DIRE spectra have a relatively small number of discrete and composite signal features consisting of a general triglyceride pattern with aliphatic side chains and typical signals from the methine (unsaturated) carbons, a strong N-acetyl signal attributed to acute phase reactive glycoproteins (mainly GlycA at δ 2.03), with a smaller but significant contribution from GlycB at δ 2.07, and a choline head group (+N—(CH3)3) signal at δ 3.20-3.30 from phospholipids including phosphatidyl-and lysophosphatidylcholines (also with aliphatic side chains overlapped with the triglycerides), that can be further decomposed into two major contributions from signals at δ 3.22 (SPC1) and δ 3.26 (SPC3); all of these signals are to a greater or lesser extent composite peaks as described below. To be expressed in a DIRE spectrum, the molecular species all share some commonality of molecular motion, diffusion and relaxation times.
The GlycA signal contains contributions from α-1-acid glycoprotein, accounting for most of the intensity, with lesser contributions from a composite of signals from α-1-antichymotrypsin, α-1-antitrypsin, haptoglobin and transferrin. The role of α-1-acid glycoprotein in binding a range of lysophosphatidylcholines in a 1:1 molar ratio as well as small lipophilic molecules has previously been established. However, in contrast to GlycA and GlycB, which increase as a response to SARS-CoV-2 infection, most (lyso)-phosphatidylcholine species decrease. This contrasting behaviour is captured in the OPLS-DA coefficient plot (
Data Pre-processing and Statistical Evaluation: 1D spectral data pre-processing comprised the excision of the residual water resonances (δ 4.5-5.0) and chemical shift regions with no signals of interest (δ<0.25 and δ>9.5). The chemical shift axis was calibrated to the a-anomeric proton of glucose (δ 5.23). A similar procedure was applied to CPMG spectra, whereas the chemical shift calibration of DIRE spectra reflects the frequency of the water suppression as no further calibration is performed. Other DIRE calibration options were tested using DIRE signals as calibrants. Spectra were baseline corrected using an asymmetric least squares method. To aid the assignment of structural information relating to DIRE signals δ 2.03 (GlycA) and 5 3.20-3.30 (SPC), 1D Statistical Total Correlation Spectroscopy (STOCSY) was applied using the DIRE spectra from all study groups (healthy controls, patients with and without SARS-CoV-2 infection). Statistical evaluation was performed with principal components analysis (PCA) as an unsupervised multivariate method, compressing the high-dimensional spectral data set into a few latent variables, thereby establishing potential sample clustering trends based on covariance structure. Group comparison was performed using orthogonal-projections to latent structures-discriminant analysis (O-PLS-DA) using a training set of seven SARS-CoV-2 positive and seven age and sex matched healthy controls. The optimal number of components (1 predictive+1 orthogonal) in the model was established using the area under the receiver operator characteristic curve (AUROC) as model generalisation index, computed in a jack-knifing statistical cross-validation framework. Data were mean-centred and scaled to unit-variance prior to multivariate modelling. All data analysis tasks were performed in the statistical programming language R, using the metabom8 package (V 0.2), obtainable at https://github.com/tkimhofer. In order to evaluate associations between GlycA, GlycB, SPC and plasma lipoproteins measured by IVDr, a Spearman's correlation analysis was performed using respective signal integrals. Results were visualized as heatmaps, rows and columns were ordered according to lipoprotein density classes. Features with low correlation coefficients are not shown.
NMR Spectra of Blood Plasma: Typical water-suppressed 1 dimensional, spin-echo and DIRE spectra of plasma (control plus patient) are shown in
The lipid peaks in
With respect to molecular information, the diffusion edited spectra, in which signals from molecules with rapid translational diffusion are eliminated, are almost the reverse of the CPMG spin echo spectra (
Multivariate Statistical Analysis of DIRE Spectra:
Whereas the sample distribution in the first principal component (PC) was attributable to variation in triglycerides, as shown in the plot of the loadings of principal component 1 in
Strong differentiation between the two groups and the projection of the remaining healthy (n=21) and SARS-CoV-2 positive (n=51) patients onto the training set (
Differential Diagnostic Information in DIRE Spectra: DIRE spectra of blood plasma give a clear and unequivocal modelling diagnostic for SARS-CoV-2 infection. Thus, the key molecular contributors to the SARS-CoV-2 diagnostic in the diffusion-edited and ordered spectra were the N-acetyl glycoprotein peaks GlycA and GlycB and one of the major components of the composite DIRE signals at δ 3.22. This signal comes partly from a molecule with the same molecular diffusion constant as GlycA and GlycB, the most likely being linoleoylphosphatidylcholines based on its chemical shifts and known abundance (as shown in
The DIRE spectra show strong signals from these phosphatidylcholine species including the N+—(CH3)3 head groups of compartmentalized phospholipids and signals from partially unsaturated fatty acid side chains. As the summed N+—(CH3)3 signals in DIRE spectra have multicomponent origins (analogous to the GlycA and GlycB N-acetyl singlets), they are referred to as the Supramolecular Phospholipid Composite signals (SPC). The summed SPC integrals (SPCtotal) were used in statistical analysis of the relationships with the GlycA and GlycB signals in control and SARS-CoV-2 negative and SARS-CoV-2 positive patients are shown below in Table 6.
Table 6 shows relative intensity group medians for GlycA, GlycB, SPC signal variables and their ratios, and Kruskal-Wallis rank sum test p values for differences between healthy controls and SARS-CoV-2 positive patients and SARS-CoV-2 negative patients vs SARS-CoV-2 positive patients. All p values shown in the table were determined using the Kruskal-Wallis rank sum test. GlycA and GlycB refer, respectively, to N-acetyl glycoprotein fragments A and B, and SPCtotal refers to the supramolecular phospholipid complex total signal. The label NS indicates “not significant.”
The triglyceride signals observed in DIRE spectra carried little direct diagnostic information being more closely reflective of body mass index than infection status. On closer inspection of the NMR data, SPC was seen to be composed of two major signals with separate average chemical shifts and linewidths (as shown in
Statistical Total Correlation Spectroscopic Analysis of DIRE Spectra: Statistical TOtal Correlation SpectroscopY (STOCSY) allows structural connectivity to be established based on the covariance of proton signals from the same molecules across a series of spectra collected in parallel. STOCSY analysis of the DIRE spectra using the GlycA signal (δ 2.03) as the statistical driver peak statistically illuminating various other structurally correlated signals from other glycan protons from GlycA that can be observed in the region from δ 3.5 to 4.3 with the highest correlations (>0.9) at δ 3.7 and δ 3.9 are shown in
As shown in
The observed STOCSY signal pattern is in good agreement with the expected chemical shifts for phosphatidylcholines as the signals at δ 3.69 and δ 4.34 correspond to the methylene groups in the choline moiety and the remaining signals belong to the attached alkyl chain. Notably, the shift for the signals at δ 0.8 and δ 1.3 matches the reported shifts for HDL particles, indicating that the STOCSY-highlighted phosphatidylcholine might be incorporated in HDL. Special attention is given to the complex multiplet signal at ca. δ 2.7 (
To corroborate the results from the STOCSY analysis, titrations of potential candidate molecules matching the observed signals were carried out.
In order to confirm the identity of the choline moiety unambiguously, a set of hetero-and homonuclear NMR experiments were performed: 1H-1H Diffusion Edited Total Correlation Spectroscopy (DE-TOCSY); 1H-13C Heteronuclear Single Quantum Coherence (HSQC); and 1H-13C Heteronuclear Multiple Bond Correlation (HMBC)). For DE-TOCSY, pre-processed plasma samples were analysed at 310 K. Bruker BioSpin Corp. pulse program ledbpgpm12s2dp, with an additional pre-saturation in sequence with 8 scans, 32 dummy scans, 512 data points in the F1 dimension, 2048 data points in the F2 dimension. D1=1.5 s, spectral width 13 ppm, O1 at 4.7 ppm, D9=90 ms, big Delta=150 ms, little Delta=3 ms z-field gradient strength 26.75 G/cm. HSQC samples prepared as described herein were analysed at 310 K. Pulse program: hsqcedetgpisp2.3 with 76 scans, 32 dummy scans, 512 data points in the F1 dimension, 4096 data points in the F2 dimension. D1=2 s, spectral width in the F1 dimension was 190 ppm, spectral width in the F2 dimension was 16 ppm. O1 in the F1 dimension was 90 ppm and O1 in the F2 dimension was 4.7 ppm. Total experiment time was 24 hours and 18 minutes. HMBC samples were prepared as described herein, and analysed at 310 K. Pulse program: hmbcetgpl3nd with additional presaturation in sequence with, 72 scans, 16 dummy scans, 512 data points in the F1 dimension, 4096 data points in the F2 dimension. D1=2 s, spectral width in the F1 dimension was 230 ppm, spectral width in the F2 dimension was 13 ppm. O1 in the F1 dimension was 105 ppm and O1 in the F2 dimension was 4.7 ppm. Total experiment time was 24 hours and 13 minutes.
First, the 1H13C edited HSQC NMR plot of
To get further insight into the nature of the phosphatidylcholine signals highlighted by STOCSY, titrations of various phosphatidylcholine standards into plasma were performed, as shown in the NMR plots of
The NMR data were processed manually using Bruker Topspin™ 3.6.2 to achieve optimal phasing, baseline correction and spectral alignment. Difference spectra were calculated where the original (non-spiked) spectrum was subtracted from the spectrum containing the standard solution. All tested standards give rise to a similar signal pattern with an observable increase of SPC (SPC1: δ 3.22 SPC3: δ 3.26) as well as two signals at δ 3.69 and δ 4.34 corresponding to the methylene groups in the choline moiety, as shown in
Both the relative intensities of GlycA and GlycB and their sums give extremely good discrimination between Healthy and SARS-CoV-2 positive individuals with Kruskal-Wallis p-values in the 10−9 to 10−10 range (Table 6). The differences for healthy versus SARS-CoV-2 negatives and controls were significant, but much weaker, as were the differences between SARS-CoV-2 negatives and positives. This relation can also be observed in
In
The statistical relationships between the measured GlycA, GlycB and SPCtotal signals and the IVDR lipoprotein parameters derived from the same samples were investigated by the standard B.I.LISA method. There exist complex relationships between the lipoprotein patterns and other metabolic and cytokine data from COVID-19 patients, such as the inflammatory driven connections to COVID-19 dyslipidemia (elevated VLDL and LDL, and elevated Apolipoprotein A1/B100) and their possible implications in new onset diabetes and cardiovascular/atherosclerotic risk. Here, the lipoprotein data is used to establish a structural and compartmental connectivity to the novel SPCtotal data and the SPCtotal/Glycoprotein ratios. A strong pattern of correlation emerges between the SPCtotal and total plasma and total HDL Apolipoprotein A1 and A2 levels. This is because most of the plasma apolipoproteins are carried on HDL, which is significantly reduced in COVID-19. Similarly, there is a strong correlation between the SPCtotal signal and multiple HDL fraction concentrations, because the HDL phospholipids are in the same structural compartment as, for instance, the free cholesterol and total cholesterol. The exception is the weaker correlations with HDL-1 and HDL-2 fractions (phospholipids and cholesterol) because these are much less reduced in the disease. Thus, it may be inferred that a significant proportion of the SPCtotal component is present in the HDL subfraction 3 and HDL subfraction 4. There is also a correlation between the SPCtotal peak and the LDL-3, LDL-4, LDL-5 and LDL-6 peaks, but these are much weaker than the HDL correlations. So, on the basis of the diffusion edited STOCSY and the statistical IVDr correlations, one may conclude that the main composite SPCtotal diagnostic markers are from phospholipids in HDL-3 and HDL-4 with a contribution from the lysophosphatidylcholine (including a linoleoyl, 18:2 species) bound to α1-1-acid glycoprotein, both of which are significantly lowered in COVID-19 disease. The fact that the α1-1-acid glycoprotein is significantly elevated in COVID-19 as part of the inflammatory response makes the various ratios of GlycA/GlycB and SPC components particularly sensitive to the presence of the disease (Table 6).
The results also indicate that JEDI provides inflammation marker quantification by simple peak integration.
It was also possible to perform an evaluation of the effectiveness of JEDI to filter lipoprotein contributions in various inflammatory states. Focussing on a serum sample with high levels of lipid/lipoprotein, shown as a gray dotted line in
In another control experiment, the lipoprotein signal itself was investigated by selective TOCSY (TOtal Correlation SpectroscopY), the results of which are shown in
Comparing the lipoprotein signature obtained from the selective TOCSY spectra with the ones obtained from spiked lipid mixture spectra suggests that the high frequency shoulder of the lipoproteins is more attenuated than its low frequency counterpart as shown, for example, in
The quartet pattern of the lipid signal is believed to be a pseudo-quartet and the result of an underlying doublet-of-triplets (dt) stemming from the coupling to the adjacent allylic proton attached to the sp2 carbon (resulting in a doublet) and on the other end the adjacent —CH2— (resulting in a triplet). The chemical shift of the two pseudo-quartets perfectly overlaps with the lipoprotein envelope of a high-concentration lipoprotein plasma sample (as shown by the black dotted circles in
The optimal parameters for the JEDI-PGPE are chosen to find a compromise between acceptable signal-to-noise (S/N) ratio and effective removal of the interfering peaks.
In addition to the JEDI-PGPE, three other related JEDI sequences were tested: the Pulsed Gradient Spin Echo (PGSE) (shown in
The results indicate that JEDI spectra of serum and plasma differ only by the contributions of fibrinogen.
The DIRE of
The general idea of JEDI is to combine T2 relaxation, diffusion and J-editing in one compact sequence and tailor the parameters for the quantification of SPC and Glyc while also retaining a high signal-to-noise ratio for application in high throughput serum and plasma analysis. Here, T2 relaxation has to be sufficient/long enough to allow for relaxation of the broad protein signal background in a serum or plasma spectrum and diffusion has to be adjusted to remove all small molecule contributions. Among the plethora of NMR editing approaches, suppression of coupled spin systems is achieved by spin-echoes with evolution delays tailored towards the signal (J-editing) which is to be suppressed. For perfect suppression of a coupling constant by J-editing, one would adjust the total duration of a spin echo to 1/(2J), when the signal yields perfect antiphase magnetization. Although this theoretical background was taken into consideration (coupling constants for the respective lines in question from standards, like l-α-phosphatidylcholine in CDCl3 at 310K were determined to be ˜7 Hz resulting in 1/(2J)=˜71 ms), spin echo delays still had to be determined experimentally, because the 1/(2J) relation only holds true for one specific coupling constant at a time and the overlapping lipoprotein of Glyc constitutes an array of different signals with different coupling constants. Additionally, an attempt was made to minimize relaxation losses, opting for short durations of the spin echoes.
In summary, a JEDI sequence should contain a diffusion scheme and one or more spin echo blocks, and the magnetization must be stored in the xy-plane for a sufficient amount of time to allow T2 relaxation.
The Pulsed Gradient Spin Echo (PGSE) (
The Pulsed Gradient Double Echo (PGDE) (
The Pulse Gradient Spin Echo×5 (PGSE-5) (
Another trade-off had to be made for the interscan delay (relaxation delay+acquisition time). For the determination of interscan delay, JEDI contains three signals of interest. First, SPC; second, Glyc and third, the residual lipoprotein signal, which can interfere with Glyc. Rough estimation of the T1 relaxation times by inversion recovery experiments yielded the following result for T1 times: SPC<lipoprotein<Glyc. This means that for short interscan delays (˜2 s) signal/time is well suited for SPC but it leads to stronger interferences of the lipoproteins with Glyc because the lipoproteins are quicker to return to equilibrium. On the other hand, for long interscan delays (˜7 s) Glyc is more interference-free but signal-to-noise is “wasted” for SPC. Hence, the interscan delay was determined experimentally with a value of ˜4 s evaluating overall signal-to-noise against lipoprotein interference of Glyc. Other sequences which use J-editing like the HAL and the SQF experiment postulated by Kojima et al. were also modified (diffusion) and tested, but did not yield better (or acceptable) results compared to the JEDI sequences described above.
Signal-to-noise performance for SPC (δ=3.17-3.33) and GlycA (δ=2.05−2.09) regions evaluated for different sequences including IVDr methods, DIRE and the JEDI approach. Signal-to-noise was extracted with the help of TOPSPIN by the use of the.sino (noise region δ=−2.6−−5.2) application.
Next to the sub-splitting of the SPC region (δ=3.17-3.33) into a low-frequency (SPC-A) and high-frequency (SPC-B) region according to the main fractions of HDL (SPC-A) and LDL (SPC-B), it is also possible to further subdivide the SPC region by correlation analysis. This is demonstrated by the control and SARS-CoV-2 positive heatmaps shown, respectively, in
The SPC/Glyc ratio showed to be a useful marker for disease recovery.
It has been shown that SPC (and its sub-regions), Glyc (and its sub-regions) and the SPC/Glyc ratio are powerful markers to assess acute inflammation during the acute phase of COVID infection and they can also be used to interrogate recovery after the acute phase of COVID. Furthermore, the markers SPC (and its sub-regions), Glyc (and its sub-regions) and the SPC/Glyc ratio can be employed as general markers of inflammation, e.g., chronic inflammation, not just limited to COVID.
As described herein, a combination of diffusion, relaxation and J-coupling edited NMR spectroscopy of blood plasma provides excellent discrimination of SARS-CoV-2 positivity from controls or SARS-CoV-2 negative patients based on the enhanced detection of occult diagnostic compartments. The same strategy yields excellent markers for inflammation (GlycA and GlycB) and cardiovascular risk (SPC3/SPC2 ratio). The key diagnostic species are from the total composite NMR signals (the supramolecular phospholipid composite, SPC) from terminal head groups in phospholipids, together with (lyso-)phosphatidylcholine and sphingomyelin, from HDL and LDL and the glycoprotein N-acetyl composite signal Glyc. Glyc has several distinguishable components, currently assigned as GlycA, GlycB (main contributions) and other glycoproteins such as fibrinogen (in lesser amounts). SPC has several distinguishable components—currently described as SPC1, SPC2 and SPC3—that are associated to the different fractions and subfractions: these carry different but related diagnostic information. These markers appear to offer excellent diagnostic discrimination, and the high speed of the DIRE/JEDI (or the like, PGSE, PGDE, PGPE) experiments could be exploited as a direct phenoconversion test to help augment conventional PCR results, which could be of value in biosecurity situations. For many COVID-19 patients, recovery is slow or incomplete, and SPC/Glyc ratios could also be employed as measures of functional systemic recovery. SPC3/SPC2 ratio could be used to evaluate the shift in cardiovascular risk infected people are exposed to. The DIRE/JEDI diagnostic is unusual in that it utilises the dynamic motional properties of the biomarker molecules as well as concentration variations to enhance classification of the disease over methods employing simple concentration metrics, and thus represents a new class of molecular dynamic diagnostic. These data also illustrate well how untargeted NMR spectroscopic exploration can be readily translated into targeted measurements that can be performed on the same sample within the same experiment.
A system that may be used for performing the necessary measurements and diagnosis is shown in
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2022/050593 | 1/24/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63145155 | Feb 2021 | US | |
63140732 | Jan 2021 | US |