The present disclosure relates to a computer-implemented method, a method for supervised training of a machine learning model for predicting BPD, and a system, for predicting a risk of an infant developing bronchopulmonary dysplasia (BPD).
Prematurely born infants, especially those born before 28 weeks of gestation, have very few alveoli at birth. The alveoli that are present tend to not be mature enough to function normal, and the infant may require respiratory support with oxygen to upkeep breathing.
Bronchopulmonary dysplasia (BPD) is typically suspected when a ventilated infant is unable to wean from prolonged high oxygen delivery. Various diagnosis criteria for BPD exist, but commonly relies on that the patient requires supplemental oxygen supply for an extended time following birth, most often 28 days. If this criterion is fulfilled, chest x-rays of the patient are typically taken and examined for signs that are characteristic for BPD, including emphysema, pulmonary scarring, and atelectasis.
While clinical classification of BPD relies on the assessment of supplemental oxygen supply at a later stage in life, typically at the 28th day of life, it is known that early treatments, including administration of steroids before the eighth day of life, can prevent development of BPD. The risk associated with said treatments may however outweigh the benefits, making treatment only a suitable option after confirmation of the disease. Thereby, there is a significant need for early prediction of development of BPD, as it can help decrease both the associated short-term and long-term effects of the disease.
Early prediction of development of BPD is of paramount importance for an effective intervention of the disease. Various clinical factors and biomarkers have been investigated for the assessment of the risk of an infant developing BPD, such as clinical scoring systems, plasma proteome analyses, and blood-cell counting (neutrophil-to-lymphocyte ratio).
The present inventors have realized that development of BPD can be predicted, with high sensitivity and specificity, early after birth, by analysis of gastric aspirate (GAS) data, clinical data and lung maturity data. The early prediction of development of BPD enables the possibility of ensuring adequate treatment of the infant, and thereby, providing potential for decreasing the significant mortality and morbidity associated with the disease.
The present invention therefore, in a first aspect, relates to a computer-implemented method for predicting risk of an infant developing bronchopulmonary dysplasia (BPD), the method comprising the steps of:
The GAS data is preferably provided as spectroscopy data, for example mid-infrared spectroscopy data. A preferred spectrum for GAS data includes the wavelengths in the range 900-3400 cm−1, such as in the range 900-1800 cm−1 and in the range 2800-3400 cm−1. FTIR spectral data, e.g. measurement data at spectral lines indicative of development of BPD, may be selected and form a basis, together with additional data of the dataset, for the prediction of development of BPD in the infant.
Secondly, the dataset may include clinical data comprising markers associated with development of BPD, such as gestational age and/or birth weight.
Thirdly, the dataset may further comprise lung maturity data indicative of the maturity of the lungs. Preferably, the lung maturity data is provided in the form of a binary value (+/−) of whether the infant has been given, or is to be given surfactant treatment.
Surfactant treatment (surfactant replacement therapy) may for example be given to infants with RDS in order to keep the alveoli from sticking together, and is in most cases administered in combination with supplemental oxygen or mechanical ventilation to help the infant breathe.
In a further aspect, the present invention relates to a method for supervised training of a machine learning model for predicting, early after birth, if a subject (e.g. an infant) is at risk of developing BPD. Preferably, the method comprises obtaining a dataset comprising information of a number of infants, shortly after birth. A machine learning model may thereafter be trained based on said dataset, together with outcome data comprising information related to whether said infants had, or developed, BPD. The dataset preferably comprises clinical data, lung maturity data and/or GAS data.
As shown by the present inventors, gastric aspirate of infants that develops BPD soon after birth, and gastric aspirate of infants that does not develop BPD are distinct. In fact, gastric aspirate, which is mainly produced in the foetal lungs, provides a highly detailed digital fingerprint of the foetal lung biochemistry, which may be used to predict development of BPD.
In an embodiment of the present disclosure, an artificial intelligence (AI) model is trained, based on outcome data, to select data points or spectral lines of a gastric aspirate measurement, wherein the data points or spectral lines are selected to most accurately distinguish between infants that develop BPD and those who do not develop BPD. As such, the training of the machine learning model may not require a priori knowledge of the relevant molecules and biomarkers of the gastric aspirate. The training might be supervised training of the AI model.
In yet a further aspect, the present invention relates to a system for predicting if an infant, early after birth, is at risk of developing BPD, the system comprising a memory, and a processing unit that is configured to carry out the computer-implemented method as disclosed herein. Preferably said system further comprises at least one spectrometry unit for obtaining spectrometry data, such as a spectrometer.
In a first aspect, the present disclosure relates to a computer-implemented method for predicting a risk of an infant developing bronchopulmonary dysplasia (BPD). The method comprises the steps of: obtaining a dataset of the infant, the dataset comprising clinical data; lung maturity data; and gastric aspirate (GAS) data; analysing said dataset, thereby obtaining an analysed data result; and based on said analysed data result predicting the risk of the infant developing BPD.
In a preferred embodiment of the present disclosure, the analysed data result is obtained by analysing the dataset by a trained machine learning model. Thereby, no human intervention may be needed for carrying out the analysis, and the trained machine learning model may be continuously optimized based on new data, e.g. training data.
Preterm birth, also known as premature birth, is the birth of a baby at fewer than 37 weeks' gestational age, as opposed to the usual about 40 weeks. Thereby, in yet a preferred embodiment of the present disclosure, the infant is a preterm born infant, such as an infant born before 37 weeks of pregnancy are completed. The infant may however be born at an earlier stage of pregnancy, such as less than 35 weeks' gestational age, or even less than 30 weeks' gestational age. The risk of development of BPD is higher at a lower gestational age.
A cause to this correlation is likely the less developed lungs of an early born infant. In general, around 16-26 weeks postmenstrual age (PMA) alveoli and lung capillaries are formed. After around 26 weeks PMA, the saccules grow in size while at around week 32 the alveoli develop. Thereby, a premature birth may be associated with underdeveloped lungs, wherein a lower gestational age means less developed lungs. The incidence of BPD in surviving infants less than or equal to 28 weeks gestational age has been relatively stable at approximately 40% over the last few decades
A significant advantage with the presently disclosed method is that it enables early prediction of development of BPD. Consequently, in an embodiment of the present disclosure, the dataset comprises or consists of data obtained within 48 hours after birth, more preferably within 36 hours after birth, most preferably within 24 hours after birth, such as at birth. The earlier the data of the dataset can be obtained, the earlier a prediction of the development of BPD in an infant can be made, and consequently, the earlier a targeted intervention can be started, having the potential to significantly improve outcome. The early intervention may comprise preventative and targeted prophylactic, therapeutic intervention with surfactant and new medicaments, and/or the mode of ventilation. Various strategies for treatment and preventive therapy of BPD are known to a person skilled in the art.
GAS Data
In an embodiment of the present disclosure the GAS data, is derived from, such as comprises or consists of, spectroscopy data, for example mid-infrared spectroscopy data. The GAS data may be derived from, or comprise, spectroscopy data in the spectrum between 900-3400 cm−1, such as between 900-1800 cm−1 and between 2800-3400 cm−1. Spectroscopy measurements of GAS, for example by FTIR spectroscopy, enable derivation of a highly detailed digital fingerprint of the foetal lung biochemistry. Thereby, GAS data may comprise FTIR spectral wavelengths and/or absorption intensities and may, combined with other markers, be evaluated for the prediction of BPD. The highly detailed digital fingerprint of the foetal lung biochemistry, is at least in part due to GAS comprising fluid that is produced in the foetal lungs.
In an embodiment of the present disclosure the GAS data is derived from, such as comprises or consists of, one or more absorption and/or one or more transmission spectra. The GAS data may consist of data derived from a single spectroscopy measurement, or the GAS data may comprise data derived from multiple spectroscopy measurements. Furthermore, the multiple measurements may have been carried out on different types of bodily fluids. In a preferred embodiment of the present disclosure the GAS data is derived from measurements of a GAS sample, such as a pretreated GAS sample.
Spectroscopy Measurements
In a preferred embodiment of the present disclosure the GAS data is derived from spectroscopy data. The spectroscopy data may have been obtained by spectroscopically analysis of GAS sample(s). The spectroscopy data may reflect the absorption of the GAS sample in the mid-infrared region (3200-900 cm−1).
The GAS data is preferably derived from measurements of a GAS sample. A GAS sample preferably comprises or consists of gastric aspirates. Alternatively or additionally, a GAS sample may comprise or consist of other bodily fluids, such as pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions) and amniotic fluids, or a combination thereof. Preferably, the GAS sample(s) is substantially dry during the analysis/measurement.
Pretreat
In an embodiment of the present disclosure the GAS sample is, preferably non-invasively, pretreated, prior to spectroscopically analysis. Pretreatment of the GAS sample may for example comprise or consist of centrifugation for formation of a precipitate, and discarding the supernatant. Alternatively or additionally pretreatment may comprise storage, preferably cold storage, such as around 4° C.
In a preferred embodiment of the present disclosure the GAS data and/or lung maturity data is derived from measurements of a bodily fluid, such as gastric aspirates (GAS), pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions), amniotic fluids or GAS, that has been pretreated.
Pretreatment of a bodily fluid may for example comprise or consist of cell lysis, e.g. by mixing with a hypotonic solution, centrifugation for formation of a precipitate, and preferably subsequently discarding the supernatant. Alternatively, or additionally, pretreatment may comprise storage, preferably cold storage, such as around 4° C., or even below the melting point.
Erythrocytes and other cells are often present in GAS. To reduce the contamination of GAS from these sources in order to improve the phospholipid measurements, it has earlier been common practice to centrifuge amniotic fluid or GAS and subsequently discard the precipitate prior to measurement of L/S. However, this procedure reduces the amount of surfactant, resulting in less accurate measurements of lung maturity
Instead, it is a preference that lung maturity data is derived from measurements, such as measurement data, of a bodily fluid, such as GAS, wherein the cells of the bodily fluid has been lysed, such as by mixing with a hypotonic solution. It is further a preference that the bodily fluid subsequently to lysis has been centrifuged at a rotational centrifugal force (RCF) and time selected such that the LBs of the bodily fluid forms a precipitate while the cell fragments, of e.g. lysed cells, and other smaller components, such as salts, remain in the supernatant. An adequate RCF and time may for example be around 4000 g and four minutes. Preferably, the supernatant is discarded following centrifugation. It is further a preference that the measurements of the, preferably diluted and centrifuged, bodily fluid comprise FTIR measurements. The FTIR measurements may thereby be measurements, e.g. dry transmission FTIR, of the LB precipitate for assessment of the lung maturity.
Sphingomyelin is typically sparsely present in the outer membranes of erythrocytes. Therefore, effective removal of erythrocytes before measurements, such as by spectroscopy, e.g. FTIR, may result in slightly increased L/S values, as compared to without removal of erythrocytes. The corresponding L/S cut-off value may as a consequence be higher than as compared to without removal of erythrocytes.
In a preferred embodiment of the present disclosure pretreatment of the bodily fluid comprises dilution with a hypotonic liquid, such as a water solution, e.g. freshwater. Dilution by a low osmolality liquid, such as freshwater, exposes the bodily fluid to hypotonic conditions, causing any present cells, such as erythrocytes, to burst. Preferably the pretreatment further comprises centrifugation, of the diluted bodily fluid. The centrifugation is preferably carried out at a relative centrifugal force, and time, such that the lysates (e.g. ruptured membranes of erythrocytes) and other small components of the solution (e.g. proteins and/or salts) end up in the supernatant while the LBs forms a precipitate, such as around 4000 g for four minutes. Thereby the supernatant It is further a preference that the measurements of the, preferably diluted and centrifuged, bodily fluid comprise FTIR measurements. The FTIR measurements may thereby be a measurement of the LB precipitate for assessment of the lung maturity.
Obtaining GAS Sample
In a preferred embodiment of the present disclosure, the GAS sample has been obtained non-invasively. In a further embodiment of the present disclosure the GAS sample has been collected, from the infant, by a feeding tube in combination with means of displacing GAS through said feeding tube, such as a syringe, or a suction catheter. GAS may for example be collected using a feeding tube attached to a syringe or a suction catheter connected to a tracheal suction set. The feeding tube or suction catheters may be placed as routinely done while establishing nCPAP for respiratory stabilisation or intubation for resuscitation.
Clinical Data
In an embodiment of the present disclosure the clinical data comprises or consists of data selected from the list including birth weight, gestational age, sex, an indicator of whether the infant has been diagnosed with RDS or not, and the severity of RDS (in relevant cases), or a combination thereof. Extreme prematurity and extremely-low-birth-weight have been well established as risk factors for BPD. Gestational age and birth weight are inversely proportional to the incidence of BPD, as well as the severity of the disease. Male infants are known to have a higher risk of developing BPD as compared to females. Additional clinical markers for BPD are known, for example as those outlined in Trembath et al. “Predictors of Bronchopulmonary Dysplasia”, Clin. Perinatol. 2013.
Lung Maturity Data
In a preferred embodiment of the present disclosure the lung maturity data is a binary value (+/−) representing whether the infant has been given, or is to be given, surfactant treatment or not.
If an infant is to be given surfactant treatment, the treatment is ideally started as soon as possible by the administration of a first dose. Preferably the dose should be given within 1 hr of birth but definitely before 2 hours of age. A repeat dose should be given within 4-12 hours if the infant is still intubated and requiring more than 30 to 40% oxygen. Subsequent doses are generally withheld if the infant requires less than 30% oxygen. Typical surfactants include Survanta, Infasurf and Curosurf, associated with specific dosing guidelines.
In an alternative embodiment of the present disclosure lung maturity data is data derived from measurements of a body fluid, for example gastric aspirates (GAS), pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions) and amniotic fluids, or a combination thereof. The lung maturity data may be derived from a lung maturity test, for example the microbubble stability test, the lamellar body counts and/or spectroscopy measurements. Preferably, in the presently disclosed embodiment, the lung maturity data is, or is derived from, spectroscopic data. Thereby, said measurements of the body fluid may be spectroscopic measurements, preferably non-invasive.
Pulmonary surfactant is a surface-active lipoprotein complex produced in type II pneumocytes in the alveoli and secreted as lamellar bodies (LBs) with lung fluid into the amniotic fluid and GAS. The main lipid content of pulmonary surfactant is DPPC. Consequently, the lung maturity data may reflect the content, or the ratio, of a surface-active lung phospholipid, such as lecithin, e.g. dipalmitoylphosphatidylcholine (DPPC), and/or sphingomyelin. The lung maturity data may for example reflect the lecithin/sphingomyelin ratio (L/S).
In an embodiment of the present disclosure the lung maturity data, is derived from, such as comprises or consists of, spectroscopy data, such as mid-infrared spectroscopy data, for assessment of lung maturity. The spectroscopy data may for example have been recorded in the mid-infrared region (3400-900 cm−1). For example by a FTIR spectrometer.
In an embodiment of the present disclosure the lung maturity data comprises one or more measurement values related to the foetal lung maturity of the infant with respect to a cut-off value. For example a measurement value related to the foetal lung maturity, of the infant, that is below (or above) said cut-off value would be associated with a higher risk of diseases related to foetal lung immaturity (such as RDS) while a measurement value above (below) said cut-off value would be associated with a lower risk of diseases related to foetal lung immaturity. The lung maturity data may thereby comprise the difference between the measurement values and the cut-off value or information whether the measurement value is above, or below, said cut-off value. Said cut-off value may be around 3, preferably around 3.05, such as 3.05 in appropriate units (e.g. moles/mol). Said cut-off value may be an L/S value.
The lecithin-sphingomyelin ratio (L/S or L/S ratio) is a test of foetal amniotic fluid to assess foetal lung immaturity. Lungs require surfactants to lower the surface pressure of the alveoli in the lungs. This is especially important for premature babies trying to expand their lungs after birth.
The L/S is a marker of foetal lung maturity. The outward flow of pulmonary secretions from the foetal lungs into the amniotic fluid maintains the level of lecithin and sphingomyelin equally until around 32-33 weeks of gestational age, when the lecithin concentration begins to increase significantly while sphingomyelin remains nearly the same. As such, if a sample of amniotic fluid has a higher ratio, it is indicative of more surfactants in the lungs and that the infant will have less difficulty breathing at birth.
Mathematical Operations
In an embodiment of the present disclosure the GAS data is derived by application of an artificial intelligence (AI) model to the spectroscopy data. The AI model may have been developed by use of training data/outcome data, wherein no a priori knowledge of the relevant molecules and biomarkers are required.
In an embodiment of the present disclosure the GAS data is derived by application of a mathematical operation to the spectroscopy data.
The GAS data may thereby be mathematically derived from spectroscopy data. The mathematical operation may comprise denoising, smoothing, background and baseline corrections, normalization (transforming to a scale of relative intensity), alignment, correction for scatter, such as scattering in NIR, and/or filtering or a combination thereof. The GAS data may thereby be preprocessed in any way.
In general, signal preprocessing is applied to correct and/or remove the contribution of undesired phenomena ranging from stochastic measurement noise to various sources of systematic errors: non-linear instrument responses, shift problems and interfering effects of undesired chemical and physical variations. These operations are also known as denoising, smoothing, background and baseline corrections, normalization (transforming to a scale of relative intensity), alignment (removing horizontal shift), and correction for scatter in near infrared. Moreover, transforming the signal, for example, by derivative operations, can implicitly accomplish normalization, baseline removal and partial band deconvolution. As far as removing horizontal shift is concerned, several algorithms which can aid to remove misalignments have been proposed.
Various filtering methods are known, acting to transform the measured data mathematically into a better version of the same data, leaving out some undesired types of variation, and model-based methods, where the better version is obtained based on a more explicit mathematical model in such a way that the information filtered out is not lost, as statistical estimates of the mathematical parameters involved in the filtering are also obtained.
Among the most used filtering methods for denoising/smoothing, that is, removing uninformative high frequency variation, there are moving average and polynomial Savitsky-Golay filtering, which works on the assumptions that the signal is smooth compared to noise (sum of monotonic functions); noise is mainly uncorrelated and will be eliminated by mild methods. Alternatively high frequency contributions may be removed in frequency (Fourier transform) or wavelet (wavelet transform) domain.
Therefore in an embodiment of the present disclosure the mathematical operation comprises or consists of a 1st order derivative. Alternatively or additionally, the mathematical operation may comprise or consist of a baseline correction algorithm, such as the Savitzky-Golay algorithm.
In an embodiment of the present disclosure the mathematical operation comprises or consists of selecting measurement data at predetermined wavenumbers of the measurement spectrum. Preferably, the predetermined wavenumbers of the measurement spectrum are important for predicting if the infant will develop BPD. Thereby, the measurement data at the predetermined wavenumbers may be indicative of whether the infant will, such as is at risk, of developing BPD. Preferably, the predetermined wavenumbers are selected such that the measurement data corresponding to the predetermined wavenumbers show a difference, preferably a statistically significant difference, difference between infants that develop BPD and infants that do not develop BPD. For example a statistical test may be applied to data acquired, early at birth, of infants, where it is known whether said infants developed BPD or not, to acquire the wavenumbers, the predetermined wavenumbers, that are statistically relevant for predicting BPD. This could thereby be considered to be a training set where the outcome is known, and the relevant wavenumbers for predicting BPD can thereby be acquired. Preferably such a training set is sufficiently large for ensuring that the difference is statistically significant. Such a statistical test may for example be a paired Cox-Wilcoxon test, such as with a two-tailed p-value <0.05.
In an embodiment of the present disclosure the mathematical operation comprises or consists of a partial least square analysis or other methods for multivariate data analysis. PLS may further be used in combination with other classification techniques such as linear discriminant analysis.
In an embodiment of the present disclosure the GAS data is obtained by a process comprising, (non-invasively) obtaining the GAS sample; (optionally) storing the GAS sample; (optionally) pretreating the GAS sample; and obtaining spectroscopy data by analysing/measuring the GAS sample, by spectrometry, such as mid-infrared spectrometry. (optionally) applying one or more mathematical operations to the spectroscopy data. Thereby GAS data is derived from spectroscopy measurements of a GAS sample.
Disease
In an embodiment of the present disclosure BPD is defined as a requirement of supplemental oxygen support at a specific number of days after birth, such as at postnatal day 28. Alternatively, BPD can be defined according to the National Institute of Child Health and Human Development (NICHD) definition from June 2000, comprising a severity-based definition that classifies BPD as mild, moderate or severe based on either postnatal age or PMA. Mild BPD is thereby defined as a need for supplemental oxygen (O2) throughout the first 28 days but not at 36 weeks PMA or at discharge; moderate BPD as a requirement for 02 throughout the first 28 days plus treatment with <30% O2 at 36 weeks PMA; severe BPD as a requirement for O2 throughout the first 28 days plus 30% O2 and/or positive pressure at 36 weeks PMA. Other definitions, including physiological definitions, exist.
Regardless of which definition of BPD one uses, a period of time is required before the classification of BPD is made. This makes identifying therapies for premature infants at risk of BPD challenging. An infant born at 23-weeks gestation who needs mechanical ventilation at 34 weeks postmenstrual age is likely to develop BPD, as defined as oxygen therapy at 36 weeks. That infant may benefit from strategies that improve short-term outcomes, but which do not reduce the incidence of BPD.
ML Model
In a preferred embodiment of the present disclosure, the analysed data result is obtained by analysing the dataset by a trained machine learning model. Preferably, the trained machine learning model is a supervised trained model, alternatively it may be a supervised and unsupervised trained model.
In an embodiment of the present disclosure the trained model is selected from the list including a support vector machine (SVM), a regression model, an artificial neural network, a decision tree, a genetic algorithm, a Bayesian network, or a combination thereof.
In an embodiment of the present disclosure the prediction comprises or consists of a percentage risk of the infant developing BPD, such as development of BPD according to any definition of BPD. Alternatively, the prediction may further comprise predicting the severity of BPD, for example mild BPD, moderate BPD or severe BPD. The model may thereby predict the development of BPD in an infant, and additionally or alternatively predict the severity of BPD. Predicting the severity of BPD may comprise predicting the severity of BPD in the infant, according to the NICHD definition of BPD, or any other severity-based classification system of BPD.
In an embodiment of the present disclosure the sensitivity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In an embodiment of the present disclosure the specificity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In an embodiment of the present disclosure the specificity and the sensitivity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In a further aspect, the present disclosure relates to the use of a machine learning model for predicting development of BPD in an infant, as disclosed elsewhere herein.
In yet a further aspect, the present disclosure relates to a system for predicting if an infant, early after birth, will develop BPD, the system comprising
In an embodiment of the present disclosure, the system comprising at least one spectrometry unit for obtaining spectrometry data, such as a spectrometer. Preferably the system is configured to obtain GAS data. The system is preferably comprising a FTIR spectrometer.
In an embodiment of the present disclosure, the system is portable and/or a bedside system. An advantage with the presently disclosed system is that it enables obtaining prediction of BPD early after birth, as the system may be present in the delivery room, or closeby.
Training
The present disclosure further relates to a method for supervised training of a machine learning model for predicting, early after birth, if a subject (e.g. an infant) suffers from, or will develop, bronchopulmonary dysplasia (BPD), the method comprising: obtaining a dataset, comprising information of a number of infants shortly after birth, comprising clinical data; lung maturity data; and gastric aspirate (GAS) data; obtaining outcome data comprising or consisting of information related to if the infants had, or developed, BPD; training a machine learning model, by supervised training, based on the dataset and the outcome data of the infants, to predict, early after birth, if a subject suffers from and/or will develop BPD.
In an embodiment of the present disclosure the subject and/or the infants are preterm born infants, such as born before 37 weeks of pregnancy are completed. In a preferred embodiment of the present disclosure, the infant is a preterm born infant, such as an infant born before 37 weeks of pregnancy are completed. Preterm birth, also known as premature birth, is the birth of a baby at fewer than 37 weeks' gestational age, as opposed to the usual about 40 weeks. The infant may however be born at an earlier stage of pregnancy, such as less than 35 weeks' gestational age, or even less than 30 weeks' gestational age. The risk of development of BPD is higher at a lower gestational age.
It is a preference that the dataset comprises or consists of data obtained within 24 hours after birth, such as at birth. The earlier prediction of development of BPD in an infant is made, the earlier a targeted intervention can be started, having the potential to significantly improve outcomes. The early intervention may comprise preventative and targeted prophylactic, therapeutic intervention with surfactant and new medicaments, and/or the mode of ventilation. Various strategies for treatment and preventive therapy of BPD are known to a person skilled in the art.
GAS Data
In an embodiment of the present disclosure the GAS data, is derived from, such as comprises or consists of, spectroscopy data, such as mid-infrared spectroscopy data. The GAS data may for example be derived from, or comprise, spectroscopy data in the spectrum between 900-3400 cm−1, such as between 900-1800 cm−1 and between 2800-3400 cm−1. Spectroscopy measurements of GAS, for example by FTIR spectroscopy, typically enable derivation of a highly detailed digital fingerprint of the foetal lung biochemistry. Thereby, GAS data may comprise FTIR spectral wavelengths and/or absorption intensities and may, combined with other markers, be evaluated for the prediction of BPD.
In an embodiment of the present disclosure, an AI model is trained, based on outcome data, to select data points or spectral lines of a gastric aspirate measurement, wherein the data points or spectral lines are selected to most accurately distinguish between infants that develop BPD and those who do not develop BPD. As such, the training of the machine learning model may not require a priori knowledge of the relevant molecules and biomarkers of the gastric aspirate.
In an embodiment of the present disclosure the GAS data is derived from, such as comprises or consists of, one or more absorption and/or one or more transmission spectra. The GAS data may consist of data derived from a single spectroscopy measurement, or the GAS data may comprise data derived from multiple spectroscopy measurements. Furthermore, the multiple measurements may have been carried out on different types of bodily fluids. In a preferred embodiment of the present disclosure the GAS data is derived from measurements of a GAS sample, such as a pretreated GAS sample.
Spectroscopy Measurements
In a preferred embodiment of the present disclosure the GAS data is derived from spectroscopy data. The spectroscopy data may have been obtained by spectroscopically analysis of GAS sample(s). The spectroscopy data may reflect the absorption of the GAS sample in the mid-infrared region (3200-900 cm−1).
The GAS data is preferably derived from measurements of a GAS sample. The GAS sample preferably comprise or consists of gastric aspirates. Alternatively or additionally, a GAS sample may comprise or consist of other bodily fluids, such as pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions) and amniotic fluids, or a combination thereof. Preferably, the GAS sample(s) is substantially dry during the analysis/measurement.
Pretreat
In an embodiment of the present disclosure the GAS sample is, preferably non-invasively, pretreated, prior to spectroscopically analysis. Pretreatment of the GAS sample may for example comprise or consist of centrifugation for formation of a precipitate, and discarding the supernatant. Alternatively or additionally pretreatment may comprise storage, preferably cold storage, such as around 4° C.
In a preferred embodiment of the present disclosure the GAS data and/or lung maturity data is derived from measurements of a bodily fluid, such as gastric aspirates (GAS), pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions), amniotic fluids or GAS, that has been pretreated.
Pretreatment of a bodily fluid may for example comprise or consist of cell lysis, e.g. by mixing with a hypotonic solution, centrifugation for formation of a precipitate, and preferably subsequently discarding the supernatant. Alternatively, or additionally, pretreatment may comprise storage, preferably cold storage, such as around 4° C., or even below the melting point.
Erythrocytes and other cells are often present in GAS. To reduce the contamination of GAS from these sources in order to improve the phospholipid measurements, it has earlier been common practice to centrifuge amniotic fluid or GAS and subsequently discard the precipitate prior to measurement of L/S. However, this procedure reduces the amount of surfactant, resulting in less accurate measurements of lung maturity
Instead, it is a preference that lung maturity data is derived from measurements, such as measurement data, of a bodily fluid, such as GAS, wherein the cells of the bodily fluid has been lysed, such as by mixing with a hypotonic solution. It is further a preference that the bodily fluid subsequently to lysis has been centrifuged at a rotational centrifugal force (RCF) and time selected such that the LBs of the bodily fluid forms a precipitate while the cell fragments, of e.g. lysed cells, and other smaller components, such as salts, remain in the supernatant. An adequate RCF and time may for example be around 4000 g and four minutes. Preferably, the supernatant is discarded following centrifugation. It is further a preference that the measurements of the, preferably diluted and centrifuged, bodily fluid comprise FTIR measurements. The FTIR measurements may thereby be measurements, e.g. dry transmission FTIR, of the LB precipitate for assessment of the lung maturity.
Sphingomyelin is typically sparsely present in the outer membranes of erythrocytes. Therefore, effective removal of erythrocytes before measurements, such as by spectroscopy, e.g. FTIR, may result in slightly increased L/S values, as compared to without removal of erythrocytes. The corresponding L/S cut-off value may as a consequence be higher than as compared to without removal of erythrocytes.
In a preferred embodiment of the present disclosure pretreatment of the bodily fluid comprises dilution with a hypotonic liquid, such as a water solution, e.g. freshwater. Dilution by a low osmolality liquid, such as freshwater, exposes the bodily fluid to hypotonic conditions, causing any present cells, such as erythrocytes, to burst. Preferably the pretreatment further comprises centrifugation, of the diluted bodily fluid. The centrifugation is preferably carried out at a relative centrifugal force, and time, such that the lysates (e.g. ruptured membranes of erythrocytes) and other small components of the solution (e.g. proteins and/or salts) end up in the supernatant while the LBs forms a precipitate, such as around 4000 g for four minutes. Thereby the supernatant It is further a preference that the measurements of the, preferably diluted and centrifuged, bodily fluid comprise FTIR measurements. The FTIR measurements may thereby be a measurement of the LB precipitate for assessment of the lung maturity.
Obtaining GAS Sample
In a preferred embodiment of the present disclosure, the GAS sample has been obtained non-invasively. In a further embodiment of the present disclosure the GAS sample has been collected, from the infant, by a feeding tube in combination with means of displacing GAS through said feeding tube, such as a syringe, or a suction catheter. GAS may for example be collected using a feeding tube attached to a syringe or a suction catheter connected to a tracheal suction set. The feeding tube or suction catheters may be placed as routinely done while establishing nCPAP for respiratory stabilisation or intubation for resuscitation.
Clinical Data
In an embodiment of the present disclosure the clinical data comprises or consists of data selected from the list including birth weight, gestational age, sex, an indicator of whether the infant has been diagnosed with RDS or not, and the severity of RDS (in relevant cases), or a combination thereof. Extreme prematurity and extremely-low-birth-weight have been identified as risk factors for BPD. Gestational age and birth weight are inversely proportional to the incidence of BPD, as well as the severity of the disease. Male infants are known to have a higher risk of developing BPD as compared to females. Additional clinical markers for BPD are known, for example as those outlined in Trembath et al. “Predictors of Bronchopulmonary Dysplasia”, Clin. Perinatol. 2013.
Lung Maturity Data
In a preferred embodiment of the present disclosure the lung maturity data is a binary value (+/−) representing whether the infant has been given, or is to be given, surfactant treatment or not.
If an infant is to be given surfactant treatment, the treatment is ideally started as soon as possible by the administration of a first dose. Preferably the dose should be given within 1 hr of birth but definitely before 2 hours of age. A repeat dose should be given within 4-12 hours if the infant is still intubated and requiring more than 30 to 40% oxygen. Subsequent doses are generally withheld if the infant requires less than 30% oxygen. Typical surfactants include Survanta, Infasurf and Curosurf, associated with specific dosing guidelines.
In an alternative embodiment of the present disclosure lung maturity data is data derived from measurements of a body fluid, for example gastric aspirates (GAS), pharyngeal secretion (e.g. hypopharyngeal secretions or oropharyngeal secretions) and amniotic fluids, or a combination thereof. The lung maturity data may be derived from a lung maturity test, for example the microbubble stability test, the lamellar body counts and/or spectroscopy measurements. Preferably, in the presently disclosed embodiment, the lung maturity data is, or is derived from, spectroscopic data. Thereby, said measurements of the body fluid may be spectroscopic measurements, preferably non-invasive.
Pulmonary surfactant is a surface-active lipoprotein complex produced in type II pneumocytes in the alveoli and secreted as lamellar bodies (LBs) with lung fluid into the amniotic fluid and GAS. The main lipid content of pulmonary surfactant is DPPC. Consequently, the lung maturity data may reflect the content, or the ratio, of a surface-active lung phospholipid, such as lecithin, e.g. dipalmitoylphosphatidylcholine (DPPC), and/or sphingomyelin. The lung maturity data may for example reflect the lecithin/sphingomyelin ratio (L/S).
In an embodiment of the present disclosure the lung maturity data, is derived from, such as comprises or consists of, spectroscopy data, such as mid-infrared spectroscopy data, for assessment of lung maturity. The spectroscopy data may for example have been recorded in the mid-infrared region (3400-900 cm−1). For example by a FTIR spectrometer.
In an embodiment of the present disclosure the lung maturity data comprises one or more measurement values related to the foetal lung maturity of the infant with respect to a cut-off value. For example a measurement value related to the foetal lung maturity, of the infant, that is below (or above) said cut-off value would be associated with a higher risk of diseases related to foetal lung immaturity (such as RDS) while a measurement value above (below) said cut-off value would be associated with a lower risk of diseases related to foetal lung immaturity. The lung maturity data may thereby comprise the difference between the measurement values and the cut-off value or information whether the measurement value is above, or below, said cut-off value. Said cut-off value may be around 3, preferably around 3.05, such as 3.05 in appropriate units (e.g. moles/mol). Said cut-off value may be an L/S value.
The lecithin-sphingomyelin ratio (L/S or L/S ratio) is a test of foetal amniotic fluid to assess foetal lung immaturity. Lungs require surfactants to lower the surface pressure of the alveoli in the lungs. This is especially important for premature babies trying to expand their lungs after birth.
The L/S is a marker of foetal lung maturity. The outward flow of pulmonary secretions from the foetal lungs into the amniotic fluid maintains the level of lecithin and sphingomyelin equally until around 32-33 weeks of gestational age, when the lecithin concentration begins to increase significantly while sphingomyelin remains nearly the same. As such, if a sample of amniotic fluid has a higher ratio, it is indicative of more surfactants in the lungs and that the infant will have less difficulty breathing at birth.
Mathematical Operations
In an embodiment of the present disclosure, an AI model is trained, based on outcome data, to select data points or spectral lines of a gastric aspirate measurement, wherein the data points or spectral lines are selected to most accurately distinguish between infants that develop BPD and those who do not develop BPD. As such, the training of the machine learning model may not require a priori knowledge of the relevant molecules and biomarkers of the gastric aspirate.
In an embodiment of the present disclosure the GAS data is derived by application of a mathematical operation to the spectroscopy data. The GAS data may thereby be mathematically derived from spectroscopy data. The mathematical operation may comprise denoising, smoothing, background and baseline corrections, normalization (transforming to a scale of relative intensity), alignment, correction for scatter, such as scattering in NIR, and/or filtering or a combination thereof. The GAS data may thereby be preprocessed in any way.
In general, signal preprocessing is applied to correct and/or remove the contribution of undesired phenomena ranging from stochastic measurement noise to various sources of systematic errors: non-linear instrument responses, shift problems and interfering effects of undesired chemical and physical variations. These operations are also known as denoising, smoothing, background and baseline corrections, normalization (transforming to a scale of relative intensity), alignment (removing horizontal shift), and correction for scatter in near infrared. Moreover, transforming the signal, for example, by derivative operations, can implicitly accomplish normalization, baseline removal and partial band deconvolution. As far as removing horizontal shift is concerned, several algorithms which can aid to remove misalignments have been proposed.
Various filtering methods are known, acting to transform the measured data mathematically into a better version of the same data, leaving out some undesired types of variation, and model-based methods, where the better version is obtained based on a more explicit mathematical model in such a way that the information filtered out is not lost, as statistical estimates of the mathematical parameters involved in the filtering are also obtained.
Among the most used filtering methods for denoising/smoothing, that is, removing uninformative high frequency variation, there are moving average and polynomial Savitsky-Golay filtering, which works on the assumptions that the signal is smooth compared to noise (sum of monotonic functions); noise is mainly uncorrelated and will be eliminated by mild methods. Alternatively high frequency contributions may be removed in frequency (Fourier transform) or wavelet (wavelet transform) domain.
Therefore in an embodiment of the present disclosure the mathematical operation comprises or consists of a 1st order derivative. Alternatively or additionally, the mathematical operation may comprise or consist of a baseline correction algorithm, such as the Savitzky-Golay algorithm.
In an embodiment of the present disclosure the mathematical operation comprises or consists of selecting measurement data at predetermined wavenumbers of the measurement spectrum. Preferably, the predetermined wavenumbers of the measurement spectrum are important for predicting if the infant will develop BPD. Thereby, the measurement data at the predetermined wavenumbers may be indicative of whether the infant will, such as is at risk, of developing BPD. Preferably, the predetermined wavenumbers are selected such that the measurement data corresponding to the predetermined wavenumbers show a statistical significance or a difference, preferably a statistical significance difference, between infants that develop BPD and infants that do not develop BPD. For example a statistical test may be applied to data acquired, early at birth, of infants, where it is known whether said infants developed BPD or not, to acquire the wavenumbers, the predetermined wavenumbers, that are statistically relevant for predicting BPD. This could thereby be considered to be a training set where the outcome is known, and the relevant wavenumbers for predicting BPD can thereby be acquired. Preferably such a training set is sufficiently large for ensuring that the difference is statistically significant. Such a statistical test may for example be a paired Cox-Wilcoxon test, such as with a two-tailed p-value <0.05.
In an embodiment of the present disclosure the mathematical operation comprises or consists of a partial least square analysis or other methods for multivariate data analysis. PLS may further be used in combination with other classification techniques such as linear discriminant analysis.
In an embodiment of the present disclosure the GAS data is obtained by a process comprising, (non-invasively) obtaining the GAS sample; (optionally) storing the GAS sample; (optionally) pretreating the GAS sample; and obtaining spectroscopy data by analysing/measuring the GAS sample, by spectrometry, such as mid-infrared spectrometry. (optionally) applying one or more mathematical operations to the spectroscopy data. Thereby GAS data is derived from spectroscopy measurements of a GAS sample.
Disease
In an embodiment of the present disclosure the classification of BPD is defined as a subject requiring supplemental oxygen support at a specific number of days after birth, typically at postnatal day 28. Alternatively, BPD can be defined according to the National Institute of Child Health and Human Development (NICHD) definition from June 2000, comprising a severity-based definition that classifies BPD as mild, moderate or severe based on either postnatal age or PMA. Mild BPD is thereby defined as a need for supplemental oxygen (O2) throughout the first 28 days but not at 36 weeks PMA or at discharge; moderate BPD as a requirement for 02 throughout the first 28 days plus treatment with <30% O2 at 36 weeks PMA; severe BPD as a requirement for O2 throughout the first 28 days plus 30% O2 and/or positive pressure at 36 weeks PMA. Other definitions, including physiological definitions, exist.
Regardless of which definition of BPD one uses, a period of time is required before the classification of BPD is made. This makes identifying therapies for premature infants at risk of BPD challenging. An infant born at 23-weeks gestation who needs mechanical ventilation at 34 weeks postmenstrual age is likely to develop BPD, as defined as oxygen therapy at 36 weeks. That infant may benefit from strategies that improve short-term outcomes, but which do not reduce the incidence of BPD.
ML Model
In a preferred embodiment of the present disclosure, the analysed data result is obtained by analysing the dataset by a trained machine learning model. Preferably, the trained machine learning model is a supervised trained model, alternatively it may be a supervised and unsupervised trained model.
In an embodiment of the present disclosure the trained model is selected from the list including a support vector machine (SVM), a regression model, an artificial neural network, a decision tree, a genetic algorithm, a Bayesian network, or a combination thereof.
In an embodiment of the present disclosure the prediction comprises or consists of a percentage risk of the infant developing BPD, such as development of BPD according to any definition of BPD. Alternatively, the prediction may further comprise predicting the severity of BPD, for example mild BPD, moderate BPD or severe BPD. The model may thereby predict the development of BPD in an infant, and additionally or alternatively predict the severity of BPD. Predicting the severity of BPD may comprise predicting the severity of BPD in the infant, according to the NICHD definition of BPD, or any other severity-based classification system of BPD.
In an embodiment of the present disclosure the sensitivity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In an embodiment of the present disclosure the specificity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In an embodiment of the present disclosure the specificity and the sensitivity of the prediction is at least 70%, more preferably at least 80%, yet even more preferably at least 90%, most preferably at least 95%.
In an embodiment of the present disclosure the trained machine learning model is evaluated. The evaluation of the trained machine learning model may be carried out by a dataset and an outcome data distinct from those used during the training of the machine learning model.
The present disclosure further relates to a system for predicting if an infant, early after birth, will develop BPD, the system comprising a memory, and a processing unit that is configured to carry out the method for predicting a risk of an infant developing bronchopulmonary dysplasia (BPD), as described elsewhere herein and/or the method for supervised training of a machine learning model for predicting, early after birth, if an infant suffers from, or will develop, bronchopulmonary dysplasia (BPD) as disclosed elsewhere herein.
In an embodiment of the present disclosure the system further comprising at least one spectrometry unit for obtaining spectrometry data, such as a spectrometer. Preferably said spectrometer is configured to obtain spectrometry data from a GAS sample and to provide said spectrometry data to the processing unit for processing of said spectrometry data. The system may thereby comprise means for providing said spectrometry data to the processing unit and/or the memory. Preferably said system further comprises a power source.
BPD Definition
The Consensus BPD definition from the US National Institutes of Health (NIH) was applied. For infants born at gestational age (GA)<32 weeks, BPD referred to the requirement of oxygen support for at least 28 days (all severities of BPD) supplemented with an assessment at 36 weeks (moderate to severe BPD) and at 40 weeks (severe BPD).
Participants
Premature infants born between 24 and 31 completed gestational weeks were eligible to participate. The infants enrolled in the study were treated as described in Heiring et al. “Predicting respiratory distress syndrome at birth using a fast test based on spectroscopy of gastric aspirates: 2. Clinical part.” Acta Paediatr. 2019, with antenatal steroids and very early nasal-CPAP when possible. Surfactant (Curosurf R) was administered following the European Consensus Guidelines on the Management of RDS as INSURE (Intubation-Surfactant-Extubation) or nasal-CPAP and surfactant administered by a thin catheter.
Sampling of GAS and Spectroscopy
Sampling of GAS at birth (0.3-2.5 mL) was collected using a feeding tube attached to a syringe or a suction catheter connected to a tracheal suction set. The feeding tube or suction catheters were placed as routinely done while establishing nCPAP for respiratory stabilisation or intubation for resuscitation.
Gastric aspirates obtained immediately after birth were stored at 4-5° C. and analysed by FTIR spectroscopy within 10 days.
The FTIR spectroscopy was performed by dry transmission, and the spectroscopic signal was enhanced by concentrating the surfactant thus avoiding the interference of proteins, salts or flocculent protein clots (e.g. mucus).
GAS (200 μL) was diluted fourfold with water and centrifuged at 4000 g for four minutes. After removal of the supernatant, the samples were suspended in 100 μL of water and split into 50 μL aliquots. 50 μL of sample was measured by FTIR analyses performed by dry transmission on CaF2 windows (1 mm thick and 13 mm diameter, Chrystran.com). The samples (50 μL) were applied onto the CaF2 and dried on a hotplate (90° C.). The FTIR spectra were measured by a Bruker Tensor 27, equipped with a DTGS detector (60 scans and a resolution of 4 cm−1).
Basic Method Development Principles
A data-driven approach was employed to develop a software algorithm capable of predicting BPD. Clinical data and lung maturity data (+/− surfactant treatment) available near the time of birth were combined with FTIR spectral data of GAS resulting in the creation of highly complex multivariate datasets. These datasets were analysed using AI and corrected to the clinical development of BPD.
Statistical Analysis
Clinical data points correlated to BPD were determined by t-test for continuous variables and chi-square test for categorical variables. Paired Cox-Wilcoxon test was used for FTIR spectral data analysis. Two-tailed p-values <0.05 were considered to indicate statistical significance.
FTIR Spectral Data
The FTIR spectral analysis range was 900-3400 cm−1. Baseline was corrected using the Savitzky-Golay algorithm and the 1st derivative was used for spectral data analysis. The Cox-Wilcoxon test was used to further select the most important variables and 43 wavenumbers were selected out of 1.200.
Model Development
Partial Least Square (PLS)
The PLS algorithm used was similar to that used in Hoskuldsson, “Common framework for linear regression”, Chemometrics and Intelligent Laboratory Systems, 2015. The score plots produced by PLS in combination with other classification techniques such as linear discriminant analysis have in many cases been proved to separate samples for better determination.
Software
R studio (Microsoft R open) software was used. A SVM model was built using the Kernlab package written in R programming language. The validation of the model performance in the training sample was 7-fold cross validation repeated 500 times. The criterion for selecting the best parameters was the minimization of classification error. Additionally, the mean sensitivity and specificity of the cross validation was calculated. The sensitivity was defined as the percentage of the correct prediction of the infants with BPD and the specificity as the correct prediction of the infants who did not develop BPD.
Results
Of the 72 eligible infants 2 died early after birth and in 9 cases parental approvals were not obtained. Thus, 61 very preterm infants were included in the study as shown in
aMedian (range)
bNo. (%)
Twenty-six (43%) developed BPD and 35 (57%) did not develop BPD. Ten of the infants with BPD also had a need for supplemental oxygen at week 36 and 2 still needed supplemental oxygen week 40.
A majority 39 (64%) of the included 61 infants had either BPD combined with RDS (n=22), or no-BPD and no RDS (n=18). Whereas, 4 BPD infants had no RDS and 17 infants with no BPD had RDS (Table 2).
The 26 infants with BPD had a median birth weight (BW) of 850 g, a median gestational age (GA) of 27.3 weeks and 20 (77%) were treated with surfactant. The 35 infants with no BPD had median BW of 1.356 g, a median GA of 30.1 weeks and 7 (20%) were treated with surfactant. BW and GA were significantly lower for infants with BPD than for infants without BPD, p<0.001 and more infants with BPD than with no BPD were treated with surfactant, p<0.001. Surfactant was given after 5.8 hours in median and latest after 33 hours (Table 1). BW, GA and surfactant treatment are important factors correlated to the development of BPD and by analysing them using a logistic regression model the sensitivity and specificity were 74% and 82% respectively. Similar data were obtained by applying SVM resulting in 76% sensitivity and 82% specificity.
The FTIR spectral data analysis of GAS resulted in the identification of the most important wavenumbers for classification. In order to reveal significant differences in the wavenumbers between BPD and no BPD a paired Cox-Wilcoxon test was applied. In total, 43 wavenumbers were selected from the selected FTIR spectral dataset.
Prediction of BPD from FTIR spectral data of GAS samples alone are shown in
By incorporating FTIR spectral data analyses with the clinical data and lung maturity data (BW, GA and surfactant treatment) into the linear SVM analysis the sensitivity increased from 76% to 86% and the specificity from 82% to 85% following cross validation. Using the parameters selected by cross validation, the fitting model was finally calculated for the 61 samples revealing a sensitivity and specificity of 88% and 91% respectively. One GAS sample was contaminated with pus. However, it was still possible to measure the sample using FTIR and correctly predict BPD.
Conclusions
The study demonstrated that it was possible to predict BPD at birth by applying AI to analyse unique multivariate datasets combining clinical data and FTIR spectral data of GAS. Further development and validation of the predictive BPD algorithm is planned including data aggregation, blind testing and clinical studies.
Items
Number | Date | Country | Kind |
---|---|---|---|
20165923.2 | Mar 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/057944 | 3/26/2021 | WO |