The present invention relates to a method of analysing a motional activity of particles with a motion detector. In particular, the present invention relates to the analysis of the motional activity of particles exhibiting a motion at a very low scale, for instance having a size ranging from Angstroms to micrometres. It particularly but not exclusively relates to the analysis of the movement or an inner dynamics of said particles.
Nano-mechanical oscillators have been widely used as sensors to detect small variations of a mass of a particle. In WO 2013/054311, microcantilevers are used to detect and measure the nano-scale vibrations (or nanomotions) of particles attached to their surface. Said particles can be of chemical or biological nature, including proteins, lipids, glucides, viruses, bacteria, yeasts, or mammalian cells among others. This system allows discriminating between states of the tested particles in presence or absence of stimuli allowing, for instance, an antibiotic susceptibility detection. To this end, the variance of the nano-scale vibrations of the microcantilevers is analysed, wherein an increase of the variance evidences the vibration or motion induced by the particle.
However, using the variance only, i.e., a single statistical parameter, does not in each case suffice to discriminate between different states. For instance, in a measurement recording of a susceptible strain and a resistant strain of particles being bacteria, the populations typically overlap. Therefore, a better way of separating the populations is required. Furthermore, viable and dead cells constitute an example of an extreme difference in the metabolic activity, wherein there is however also a need for a discrimination between less extreme cases or an analysis of other motional activities of particles such as a respiration level, a growth rate, etc.
It is an object of the present invention to provide a method that enables an improved analysis of the motional activity of particles. In particular, it is an object to provide a method that enables an improved analysis of the motional activity of particles in similar states or under complex conditions.
This object is achieved with the method according to claim 1. That is, a method of analysing a motional activity of particles with a motion detector is provided. The motion detector comprises a flexible support being configured to deflect such as bend, a detection device for detecting signals being generated upon the deflection of the flexible support, and an evaluation device for evaluating the detected signals. The method comprises the steps of i) bringing at least one particle into contact with the flexible support, ii) detecting at least one time-dependent signal that is indicative of deflections of the flexible support due to motions of the at least one particle, using the detection device, and iii) evaluating the detected time-dependent signal with the evaluation device. It is preferred that steps i) to iii) are consecutively and in said sequence performed. The evaluation of the detected time-dependent signal comprises a) analysing a time-dependency of the signal to derive a plurality of signal parameters from the signal that characterise a variation of the signal as a function of time, and b) executing a linking algorithm that has, as input variables, an input vector comprising at least a selection of said signal parameters, and has, as an output variable, at least one activity indicator that is indicative of a motional activity of the particle. It is preferred that steps a) and b) are consecutively and in said sequence performed, i.e. step b) is preferably performed after step a).
The present invention is based on the insight that the particles being in contact with the flexible support cause random signals, i.e. signals that vary in time but do not follow a simple oscillation pattern so as to appear, on first sight, to be random, or stated differently, signals being buried in noise background, measurement artefacts and the like, while comprising useful information on the motional activity of the particle, and wherein said information can be extracted using mathematical methods. In fact, the present method enables an extraction of said information by analysing the changes of the signal properties in time. That is to say, in the method according to the invention the time-dependent signal is analysed using mathematical methods. The mathematical methods can be applied to the detected signal in the time domain and/or to transformed signals obtained by transforming the detected signals into the frequency domain. In any case, the signal is detected for a plurality of points in time so as to obtain a time-dependency of the signal, and the time-dependency is analysed in order to determine time-independent signal parameters. These signal parameters characterise the time variation of the signal. At least a selection of these signal parameters is then fed to the linking algorithm as the input variables, and the linking algorithm provides at least one output variable being based on the input variables and being at least one activity indicator being indicative of a motional activity of the particle.
The flexible support can be a cantilever, preferably an AFM cantilever, a fibre such as a hollow fibre or a glass fibre, a membrane, a wire, a sponge, a flexible electrode, etc. It is furthermore preferred that the particles are attached to a surface of the flexible support before being analysed. Said attachment is preferably done according to methods as known in the state of the art such as by functionalizing the surface of the flexible support or by using attachment compounds known in the art, respectively. It is particularly preferred that the particles are attached to the surface of the flexible support as described in WO 2021/130339, which is incorporated herein by reference. That is, it is preferred that the particles are dispersed in a solution comprising at least one of a gelling agent, a gellable agent and a thickening agent, and wherein said dispersion is subsequently added to the surface of the flexible support.
Moreover, the motion detector preferably corresponds to the nanoscale motion detector as disclosed in WO 2013/054311, which is incorporated herein by reference as well. Hence, the deflections of the flexible support are preferably measured by an optical level detection method, wherein changes of position of a spot caused by laser light being reflected from the surface of the flexible support are measured with an optical detector such as a photodetector, particularly preferably with a position sensitive photodetector. The changes of position of the spot cause changes of signal recorded by the detector. As will be explained in greater detail further below, said signal is preferably processed before the signal parameters are derived from it. As such, said signal can also be referred to as a raw input signal (RIS).
The method preferably comprises, in the step of analysing the time-dependency of the signal, defining a plurality of time intervals, wherein the time-dependency of the signal is analysed within each time interval separately in order to obtain a value for each of a plurality of signal estimators for each time interval, and wherein a time series of each signal estimator is analysed in order to obtain each of the signal parameters.
The signal estimators can be determined in the time-domain or in the frequency-domain. In the latter case, it is preferred that the power spectra are estimated for each time interval of the signal.
Each signal estimator value is preferably determined by fitting a noise model to the power spectrum of the signal and/or by applying at least one statistical algorithm to the signal. The signal estimators are preferably estimators of moments of probability density functions and/or geometrical properties of power spectral density functions and/or percentiles of probability density functions and/or correlation parameters and/or partial correlation parameters and/or parameters of auto-regressive moving average models and/or parameters of nonlinear auto-regressive moving average models.
The noise model is preferably selected from a flicker noise model, a white noise model, a band noise model or combinations thereof. The Flicker noise model is preferably expressed as the weighted sum of 1/fα functions, wherein f means the frequency and alpha (a) is a free parameter. The white noise model is preferably expressed as a constant or constant being modulated by frequency responses of the motion detector. The band noise model is preferably expressed as a rational function of an order being preferably optimised for a particular application. For instance, the second order band noise model could correspond to the damped harmonic oscillator-based model in susceptibility tests of E. coli strains. Examples of geometrical properties of power spectral density functions are extreme points, inflection points, the area under different parts of the curve, or horizontal and vertical distances between them.
Regarding the estimators of moments of probability density functions, it should be noted that they can have different orders of moments, they can be standardised or not, etc. For instance, the estimators of moments of probability density functions can be the variance, skewness, kurtosis, etc. It is furthermore preferred that they are optimised for particular applications, e.g., for E. coli antibiotic susceptibility detection the variance could be utilised. Regarding the percentiles of probability density functions, it should be noted that they can have different percentile ranks. For instance, the percentile rank could be the 25th percentile, the 50th percentile (median), the 75th percentile, etc. It is furthermore preferred that the percentile rank is optimised for particular applications, e.g., for metabolic activity classifications the median could be used.
As has been outlined above, the signal analysis can be based on signal parameters being related to different noise models, wherein the signal estimators are determined by fitting a noise model to the power spectral density function of the signal, for instance. However, these theoretical models are as useful as practice follows theory. In order to properly analyse cases where estimated spectra diverge from theoretical models, the geometrical properties of the spectrum are helpful.
Therefore, analysing the time-dependency of the signal using power spectral density functions is a useful method to analyse random signal properties in the frequency domain.
Hence, in the method according to the invention, each signal estimator can be a geometrical property of a power spectral density function.
Various methods of determining the power spectral density function are conceivable, wherein these methods are well-known in the art. For example, the power spectral density function can be determined by the Welch's method.
The geometrical property can be an extreme point of the power spectral density function, an inflection point of the power spectral density function, an area under a curve described by the power spectral density function or a horizontal distance and/or a vertical distance between them. Examples of an extreme point of the power spectral density function are a global maximum or a local maximum or a global minimum or a local minimum of the power spectral density function. The inflection point of the power spectral density function is understood as the point where the power spectral density function changes direction of its curvature. The area under a curve described by the power spectral density function is understood as an area under the power spectral density function between two points on the power spectral density function. Said area is preferably determined by integrating the power spectral density function between said two points. To this end, any two points on the power spectral density function can be used. Horizontal and/or vertical distances between them are, for example, horizontal and/or vertical distances between extreme points, or between inflection points, etc.
Signal parameters that are derived from such signal estimators can be seen as spectral signal parameters. Quantile signal parameters are a generalisation of spectral signal parameters. As has just been outlined above, spectral signal parameters can be calculated on the basis of the power spectral density function. The power spectral density function of the signal is estimated as an average of signal periodograms, wherein a periodogram is the squared magnitude of a Fourier transform of a signal segment. The power spectral density function is statistics. It means that it reveals the properties of a random signal (stochastic process). If the average of periodograms is used then some sporadic but high amplitude events deteriorate this statistic. For this reason, very frequently robust statistics are used like median for instance. However, if these sporadic events carry on useful information, not only median but the probability distribution of periodograms has found to be useful in order to capture said information.
Hence, in the method according to the invention the at least one signal estimator can be obtained by determining, for each time interval, a plurality of periodograms. For all periodograms, at least one quantile for at least one frequency range is determined. The quantile preferably is a percentile. These signal parameters can be referred to as quantile signal parameters.
The quantiles, in particular the percentiles, can be determined for various frequency ranges. For instance, the frequency ranges for which the quantiles or percentiles of the periodograms are estimated can be 0-10 Hz, 10-100 Hz, etc., although other frequency ranges are likewise conceivable. In other words, for each periodogram, quantiles such as percentiles for particular frequency ranges are determined. As such, a set of time series of signal estimators being quantiles of periodograms, in particular being quantiles such as percentiles for specified frequency ranges, are obtained.
This set of time series can be extended, and wherein the at least one signal parameter is obtained from the extended time series. In other words, the time series of the signal estimators can be extended by combining time series of signal estimators, and wherein the signal estimators are obtained from the combined time series.
In particular, the time series of at least one of the signal estimators can be extended so as to form an extended time series by combining at least two signal estimators into differences and/or ratios thereof.
Additionally or alternatively, the time series of at least one of the signal estimators can be extended so as to form an extended time series by transforming a plurality of quantiles, in particular percentiles, into values of an empirical probability density function. In particular, all quantiles such as all percentiles can be transformed to values of an empirical probability density function (EPDF), and wherein two consecutive quantiles such as percentiles are used to calculate one EPDF value. Furthermore, at least one distance between the values of the empirical probability density function and values of a theoretical probability density function can be determined. For instance, the distance can be measured between all EPDF values and theoretical values of a Chi2 probability density function. Examples of distances are Kullback-Leibler divergence and Jensen-Shannon distance, although other distances are likewise conceivable. Additionally or alternatively, at least one geometrical property of the empirical probability density function can be determined and/or various moments of EPDF (mean, variance, skewness, kurtosis, etc.)
At least one signal parameter can be derived from the time series of at least one of the signal estimators and/or from the extended time series of at least one of the signal estimators by i) defining a plurality of sub-series for the time series and/or for the extended time series and, for each sub-series, determine statistical properties and preferably furthermore at least one of ratios, differences and ratios of differences thereof, and/or ii) fitting one or more parametrized curves comprising one or more parameters to the time series and/or to the extended time series, and/or iii) defining a plurality of sub-series for the time series and/or for the extended time series and, for each sub-series, fit one or more parametrized curves comprising one or more parameters in order to obtain fitted parameters and preferably furthermore determine ratios, differences and ratios of differences thereof.
Conceivable statistical properties are percentiles and moments, for instance. Conceivable curves are exponential curves, polynomial curves, trigonometric curves, and linear and nonlinear mixtures thereof.
The power spectral density function of the signal of certain particles revealed a strong 1/f characteristic. This kind of spectrum is often connected with signal self-affinity. Self-affinity can be analysed by detrended fluctuations analysis (DFA) and its generalisation multifractal detrended fluctuations analysis (MF-DFA). In these methods, the dependence of statistical properties of the signal on the time scale is investigated.
Hence, in the method according to the invention, the evaluation of the detected time-dependent signal with the evaluation device can comprise the detection of the time-dependent signal during a time t, and wherein analysing the time-dependency of the signal preferably comprises the steps of:
The digital integration is applied to the signal within every time interval. As a result, the signal within every time interval becomes an integrated signal. The digital integration of any digital signal xi of N samples can be defined by the following equation
The detrending algorithm preferably fits a polynomial of any order (i.e. the second order) to the integrated signal and further subtracts the fit from the integrated signal. That is, the fit residue becomes the detrended integrated signal. The detrending is preferably applied to every subinterval separately. As a result all subintervals become detrended subinterval.
The integrated and detrended signals for all subintervals for the specific number of subintervals the time interval is divided into are combined back into the one signal for the whole time interval (e.g. 20 min interval of the signal is integrated and after that divided into 4 5 min subintervals, each of them is detrended and all are combined back into 20 min interval signal). The result of detrending depends on the number of subintervals the signal time interval is divided into. That is, it depends on the length of each subinterval.
At least one generalized mean of a signal time interval (integrated, detrended for at least one number of subintervals and combined back) is calculated (e.g. F2(128) means generalized mean with exponent 2 and 128 subintervals).
Generalized mean with non-zero exponent/power q of any digital signal xi of length N is defined by the following equation:
Generalized mean with zero exponent/power q of any signal xi of length N is defined by the following equation:
The generalized mean with exponent 2 is simply standard deviation, with exponent 0 it is geometrical mean.
The signal parameter is preferably obtained from generalized means and/or their ratios and/or differences.
In particular, signal parameters can be derived from other signal parameters such as by combining two or more signal parameters into ratios and/or differences thereof, wherein the resulting signal parameters become independent of certain factors such as environmental changes.
The method preferably further comprises executing a feature selection algorithm to automatically select a subset of signal parameters so as to obtain the input vector being fed to the linking algorithm. The feature selection algorithm is preferably configured to select the subset of signal parameters in such a manner that the subset is optimal to a multi-objective criterion.
Preferred feature selection algorithms implement a univariate selection such as F-test, mutual information, chi-squared, a penalty based method such as L1, L2 and elastic net, and forward or backward feature selection based on properties of particular linking algorithm (like coefficients in logistic regression or importances in tree-based algorithms) or based on various forms of cross-validation or bootstrapping with multi-objective criterion.
The multi-objective criterion preferably is at least one of an accuracy, a sensitivity, a specificity, a mean squared error, a mean absolute error, a root mean square error, a fraction of variance unexplained, a coefficient of determination, a number of signal parameters or a generalisation gap.
For instance, the input vector can comprise signal parameters being optimal in the Pareto sense. Being optimal in the Pareto sense is understood as being a non-dominated solution for which an improvement of all criteria is not possible. In other words, improving one criterion would deteriorate at least one other criterion. As an example, when the feature selection algorithm corresponds to the forward selection with cross-validation algorithm, a selection of the signal parameter could occur when cross-validation scores are not improved in the Pareto sense. Additionally or alternatively the input vector can comprise signal parameters being optimal with regard to a single criterion being comprised of a linear combination of two or more, for instance of all criteria. Using a linear combination of two or more criteria can be understood as to capture several aspects of the assessment of the linking algorithm. Additionally or alternatively, the input vector can comprise signal parameters being optimal with regard to one of the criteria mentioned above however under constraints, for instance constraints being defined by experts in the field of the art or on other criteria such as the specificity not being lower than 80%. These constraints can be defined after an analysis of the Pareto optimal solutions.
A drift cancellation is preferably applied to the signal prior to the derivation of the signal parameters from the signal. That is, the drift cancellation is preferably applied to the raw input signal, or RIS, mentioned initially. Again in other words, the signal estimators are preferably determined after the drift cancellation.
The drift cancellation preferably comprises a curve-fitting procedure that models a drift component of the signal or a high-pass filtering. Said drift cancellation serves the purpose of cancelling a drift of the motion detector that manifests itself in the detected signals. Said drift can be caused by various factors such as the temperature, humidity, change of state of a surrounding fluid, etc.
For instance, a drift cancellation can be performed by splitting the signal, in particular the RIS, into intervals and to thereafter apply a curve fitting in order to model a drift component of the drift. Examples of curves to be fitted are polynomials, exponential functions, trigonometric functions, or their mixtures. The fitted curves can be defined in a linear and in a nonlinear form. The type of curve, the number of fitting parameters, and the length of time intervals are preferably optimised for particular applications. The fitting error is preferably used by the next steps of the analysis, in particular in the calculation of the time series of the signal estimators.
Another example of a drift cancellation comprises the application of the just described curve fitting, however in a moving window regime and with fitting parameters that are continuously adapted. The fitting error is preferably used by the next steps of the analysis, in particular in the calculation of the time series of the signal estimators.
In this example, the fitting is preferably repeated for every single point of the signal, in particular the RIS, while using a buffer such as a circular buffer storing the last samples of the signal. The fitting is preferably modified so as to efficiently update curve fitting parameters in order to be able to process millions of signal points. The type of curve, the number of fitting parameters, and the length of the buffer are preferably optimised for particular applications as well.
Another example of a drift cancellation comprises the application of a high-pass filtering to the signal, in particular, the RIS, and wherein the high-pass filtering preferably has a particular order, type, and parameters including a phase shift. Examples of the type are Butterworth, Chebyshev I and II, Kaiser, elliptic, etc. The filter output is preferably used by the next steps of the analysis, in particular in the calculation of the time series of the signal estimators.
The drift cancellation algorithm preferably is a detrending algorithm. Said detrending algorithm is preferably configured to detrend the signal.
In particular, in the event of the at least one signal parameter being a spectral signal parameter as described earlier, it is particularly preferred that a detrending algorithm is applied to the signal before the time-dependency of the signal is analysed within the time intervals.
In the event of the at least one signal parameter being determined from a signal estimator being a quantile such as a percentile of a probability density function of periodograms, it is particularly preferred that a detrending algorithm is applied to the signal before the periodograms are determined, and wherein the plurality of periodograms is determined for the detrended signal of each time interval.
For instance, the detrending algorithm can be a fitting algorithm that fits a polynomial of the kth order to the signal and subtracts the fitted polynomial from the signal, whereby the signal is detrended. By detrending the signal a drift of the signal can be removed.
The linking algorithm particularly preferably is a regression algorithm or a classification algorithm.
In the event that the linking algorithm is a regression algorithm, the multi-objective criterion mentioned earlier preferably is at least one of the mean squared error, the mean absolute error, the root mean square error, the fraction of variance unexplained, the coefficient of determination and the like, the number of signal parameters, or the generalisation gap.
In the event that the linking algorithm is a classification algorithm, the multi-objective criterion mentioned earlier preferably is at least one of the accuracy, the sensitivity, the specificity and the like.
It is furthermore preferred that the linking algorithm is a machine learning (ML) algorithm. In some embodiments, the ML algorithm may be an ML regression algorithm, specifically, an ML algorithm implementing linear regression, decision tree regression, random forest regression, gradient boosting trees regression, Kernel regression, a multiperceptron neural network, and RBF neural network regression. Such algorithms are well known in the art.
In other embodiments, the ML algorithm may be an ML classification algorithm, specifically, an algorithm implementing logistic regression, decision tree, random forest, gradient boosting trees, support vector machines, multiperceptron neural networks, and Radial Basis Functions (RBF) neural networks. Also these types of algorithms are well known in the art.
Many other suitable ML algorithms are known in the art and may be employed.
The particle preferably is a biological object and/or a part thereof and/or a non-biological object and/or a part thereof. A preferred biological object is a cell such as a prokaryotic cell or an eukaryotic cell, a spore, a virus, a phage, and a matter of biological origin such as vesicles, peptides, proteins, polysaccharides, lipids, glucides, nucleic acids and co-polymers, protein-RNA co-polymers, RNA-DNA co-polymers, protein-DNA co-polymers, RNA/DNA-protein co-polymers, protein-protein co-polymers, or capsules. A preferred part of a biological object is an organelle. A preferred non-biological object is a protein, lipid, nucleic acid such as DNA, and nanodevices. The cell preferably is a living cell and/or the organelle preferably is a living organelle. The cell can be a prokaryotic cell and/or a eukaryotic cell. The cell can also be a motile cell and/or a non-motile cell. The cell preferably is a bacterial cell and/or a fungal cell such as a yeast cell, and/or mammalian cell and/or insect cell and/or plant cell. The organelle preferably is a mitochondrium and/or a nucleus or any other subcellular structure.
The activity indicator is preferably indicative of at least one of a metabolic activity of the particle, a physiological state of the particle, a respiration level of the particle, an ATP level of the particle, a ratio of NADH/NAD+ of the particle, a membrane potential of the particle, a growth rate of the particle, a kill rate of the particle, an interaction of the particle with other particles, and an interaction of the particle with an environmental factor of the particle or a combination of any of those.
The environmental factor can be of a chemical nature, e.g., a metabolite or a drug such as an antibiotic, a toxin, a denaturing agent or any other chemical compound affecting the activity of the particle. The environmental factor can however likewise be of a physical nature, e.g., a temperature, humidity, viscosity, oxygen level, redox potential, radiation, pressure or any other impact affecting the activity of the particle. The interaction of the particle with the environmental factor can be understood as a response of the particle to the environmental factor. The metabolic activity of the particle can be a respiration, energy household (e.g., ATP, NADH, NAD+ levels), membrane potential, activity of ion channels, activity of the cytoskeleton, etc. The physiological state of the particle can be a response to one or more of the aforementioned environmental factors. The interaction of the particle with other particles could be a killing, proliferation, predation, mating, quorum sensing, or conformational changes of membrane complexes and/or protein complexes and/or other cellular complexes and/or macromolecular complexes.
The particle is preferably subjected to at least one chemical stimulus and/or at least one physical stimulus. The chemical stimulus preferably is an addition of a drug such as antibiotics or a compound affecting the metabolism or viability of the particle such as the cell. The chemical stimulus can likewise be a change in an environmental condition such as a culture condition, etc. The physical stimulus preferably is an application of stress.
The activity indicator is preferably determined before and/or during and/or after the particle is subjected to the chemical stimulus and/or the physical stimulus. It is furthermore preferred that the thus determined activity indicators are compared with one another.
For instance, if one were to analyse the effectiveness of a drug on particles such as antibiotics on bacteria, the activity indicator could be determined before the addition of the antibiotics to the bacteria and after the addition of the antibiotics to the bacteria. If the antibiotics are effective, one would determine a difference in the activity indicators. For instance, the killed bacteria would no longer exhibit a metabolic activity. In addition, dying or susceptible bacteria can alter their metabolic activity upon the exposure to an antibiotic. That is, the bacteria must not immediately be dead.
In a further aspect, a method of generating a training data set of a machine learning algorithm is provided. Said method comprises the steps of:
The training data set thus comprises a plurality of the signal parameters and the associated activity indicators for a plurality of particles.
In a further aspect, a method of training a machine learning algorithm is provided, wherein the machine learning algorithm is trained with the training data set mentioned above.
That is, the machine learning algorithm is trained with the signal parameters and the associated activity indicators. Consequently, the machine learning algorithm can make predictions of activity indicators based on signal parameters and vice versa without knowledge of the particles and their motional activity being necessary.
For instance, the training can be based on a training data set comprising experimental results being detected by the detection device of a motion detector. This training data set is then preferably labelled by results of reference experiments that measure, for instance, the metabolic activity of interest using a reference method such as the susceptibility of strains being measured by the Kirby-Bauer test. Said labelled data is then used to select the signal parameters, optimise all stages of the linking algorithm and assess performance using cross-validation techniques.
Preferred embodiments of the invention are described in the following with reference to the drawings, which are for the purpose of illustrating the present preferred embodiments of the invention and not for the purpose of limiting the same. In the drawings,
ab shows a graph depicting a signal being detected with the motion detector for a cancer cell line SW480 under different conditions (a, with doxorubicin and b, media control) and for which a signal estimator over time that has been determined during the evaluation with the method according to the invention;
Various aspects of the invention shall now be illustrated with reference to the figures.
As mentioned earlier, the present invention enables an improved analysis of the motional activity of particles with a motion detector 1 comprising a flexible support 2 being in contact with the particles to be analysed and being configured to deflect due to motions of the particles, see
For illustrative purposes only, preferred steps as well as a preferred ordering of these steps that allow the analysis of the motional activity of the particles according to the method of the invention are summarised hereinbelow and with reference to
S0: The starting point of the analysis is at least one time-dependent signal being indicative of the deflections of the flexible support of the motion detector. Here, said signal is referred to as raw input signal, RIS.
In order to cancel any drifts in the RIS, the method preferably subjects the RIS to a drift cancellation.
S1: From the signal, either from the RIS but preferably after its drift cancellation, the method calculates a time series of signal estimators. The calculation of the time series of signal estimators can be done in the time domain or in the frequency domain.
If signal estimators in the time-domain are to be calculated, the signal is split into time intervals, wherein the time-dependency of the signal is analysed within each time interval separately in order to obtain signal estimators, and wherein a time series of the signal estimators is analysed in order to obtain the signal parameters. For instance, the signal estimators can be determined by using at least one of the following:
The signal estimators, for instance the values of particular statistical algorithms or noise models such as time-varying noise model parameters, preferably become samples of a new time series. These new time series of signal estimators preferably consist of fewer points, reveal characteristic patterns, and can be used as input for a determination or an extraction of the signal parameters, see further below.
If signal estimators in the frequency-domain are to be calculated, the signal is preferably split into intervals for which power spectra are estimated. The method of estimation and windowing is preferably optimised for particular applications. After that, various noise models are preferably fitted to a given spectrum and the signal estimators, here the fitting parameters, are preferably used as samples of new time series. Apart from signal estimators being parameters of noise models, also geometrical properties of the estimated spectrum could be measured (e.g. extreme points, inflection points, area under different parts of the curve, horizontal and vertical distances between them).
The result is a set of time series, each one for a signal estimator being a particular fitting parameter or property. Also in this way, the number of points can be significantly reduced, allowing identification of patterns or shapes of the time series characteristic for a given metabolic activity by the extraction of signal parameters, see further below.
S2: After the calculation of the time series of signal estimators, the times series of the signal estimators in the time-domain or in the frequency-domain are transformed to vectors of signal parameters. The transformation is preferably done by means of one of the following three methods.
The result of the extraction of the signal parameters is a large vector of real numbers (usually more than 200). This vector is preferably shortened by selecting a subset of the signal parameters by executing a feature selection algorithm.
S3: Hence, after the extraction of the signal parameters, a selection of signal parameters is performed by means of a feature selection algorithm. In this way, a robustness and a generalisation level of classification or, in a sense, prediction algorithm are provided. The feature selection algorithm selecting the signal parameters is preferably optimised for a particular application and might be one of the following:
The model Pareto optimality concept can be used to select signal parameters' vectors optimal to a multi-objective criterion in forward or backward selection. The number of signal parameters should be a minimum while accuracy is a maximum. Usually, it is not possible to decrease the number of used signal parameters without decreasing accuracy. From these Pareto optimal algorithms, the final linking algorithm comprising the optimal input vector of signal parameters is therefore preferably selected based on domain-specific criteria with human expert assistance, for instance. This additional criterion could be the performance for a special test dataset or additional requirements defined by an expert.
S4: The input vector obtained after the selection of the signal parameters is fed to a linking algorithm. The linking algorithm uses the input vector of signal parameters to output the activity indicator of the particles. This activity indicator may be discrete, where the input vector of signal parameters extracted and selected in the previous steps is used as an input for a classification algorithm. That is, the linking algorithm corresponds in this case to a classification algorithm. The classification algorithm can be one of the following:
The type of linking algorithm and values of its hyper-parameters are preferably optimised for particular applications.
The activity indicator may also be a real number, where the input vector of signal parameters is also used as an input for a regression algorithm. That is, the linking algorithm corresponds in this case to a regression algorithm. The regression algorithm is preferably one of the following:
The type of linking algorithm and values of its hyper-parameters are preferably optimised for particular applications.
More practically speaking, the particles being in contact with the flexible support cause random signals buried in a significant noise background with useful information coded in both time and frequency domain. Particles such as organisms evolve in time under different stimuli (like nutrients, antibiotics, chemicals) that affect the properties of the random signals. The signal properties that provide for instance diagnostic information can be extracted using the method according to the invention by observing their changes in time under properly optimised measurement conditions, for instance. Due to the biological variability between tested organisms and between the different conditions measured, the herein proposed method has proven as a powerful tool that allows separating diagnostic information from measurement artefacts and noise background. Relating this example to the analytic outline depicted in
In this experiment, the reference strain ATCC-25922 of the Gram-negative bacterium E. coli was cultured to late logarithmic phase (OD600=1=109 cells/mL) under standard laboratory conditions (i.e., nutrient-rich Miller's Luria Bertani media (LB), aerobic, 37° C.) and harvested using centrifugation (2000 g, 3′). The pellet was separated and split into two groups for the subsequent analysis. The first group of bacteria was kept alive at room temperature while the second group was exposed to 60° C. for 20 min to generate non-viable bacteria due to the denaturation of enzymatic complexes and other cellular essential processes. As expected and confirmed by an independent measurement, in this latter group, we were unable to detect metabolic activity using the redox-dependent fluorescent dye resazurin that shifts its emission/excitation spectrum due to respiratory activity, a main indicator for the metabolic state of a cell.
Prior to the start of the experiment, the microcantilever of the motion detector was functionalized with a linking agent required for attaining a sustainable cell attachment that reliably lasts throughout the nanomotion recording. The attachment was done as described in WO 2021/130339 A1. In fact, in this experimental setup, positively charged Poly-D-Lysine (PDL) was used as it can facilitate the attachment of E. coli cells exhibiting a negatively charged lipopolysaccharide surface. In fact, most cells exhibit a negative net charge on their surface and in cases where this is not the case the functionalizing agent can be adapted, see also WO 2021/130339A1. Here, a solution of 0.1 mg/ml Poly-D-Lysine in ultrapure water was applied on the microcantilever for 20 min, rinsed off again with ultrapure water and dried. Using the motion detector, the deflections of the functionalized bare microcantilever were recorded in ½ concentrated LB and served as the Blank for the subsequent measurements of the two groups of E. coli cells. Both viable and heat-killed groups of E. coli cells were then attached to a microcantilever in parallel recordings. That is, two phases of a signal recording were done: Blank phase, where the deflections of an empty cantilever are measured followed by the Bac phase where the deflections of a cantilever with attached bacteria are measured.
From the deflection of the microcantilever and after drift cancellation using linear detrending (1st order polynomial) of 10s long intervals a time series of a signal estimator, in this case the variance (second moment of probability density function), is calculated and plotted for 20 min, first for the bare microcantilever (Blank phase) and second for the microcantilever with attached ATCC-25922 for each group of viable and non-viable cells (Bac phase). That is, in
From the variance, i.e., the signal estimator, the signal parameter was calculated: the median of the variance over 20 min of recording for the Blank and the Bac phases. In the Blank phase the mean μ was 1.6×10−6 V2/V2. After attachment of viable cells, in the Bac phase, the variance increased by one order of magnitude to 1.5×10−5 V2/V2, whereas the median of the variance of heat-killed, non-viable bacteria in the Bac phase stayed at the level of the Blank phase (μviable=1.5×10−5 V2/V2 vs μheatkilled=2.5×10−6 V2/V2, p=0.0013, t-test, p of the variance was calculated over 20 min) in concordance with other metabolic activity assessments as, e.g., using the fluorescent dye resazurin, mentioned earlier.
With reference to the further figures, more complex scenarios with less extreme differences between groups (in regard to biological qualities or phenotypes) are analysed by the method according to the invention.
To this end,
In fact, for the analysis presented in
The two patterns recognizable in
For this reason, signal parameters were calculated and thresholded to determine if a given strain is susceptible or resistant—that is, if the value of a signal parameter is greater than a threshold value, the strain is determined as resistant, and if the value is not greater than the threshold value, the strain is determined as susceptible. To determine whether cells with different reactions to antibiotic stress can be classified, the signal of two additional susceptible clinical isolates RN-26 and RN-49, as well as of two additional resistant clinical isolates B1 and B15 from a Swiss hospital were detected with the motion detector. In total 67 experiments with these strains exposed to ceftriaxone were recorded. Signal parameters were extracted from single signal estimators that gave one of the best separations between the two different groups as for example the Median of the first 30 min of the Drug phase normalised to the Median of the entire Drug phase or the last 30 min of the Drug phase normalised to the Median of the entire Drug phase.
From
In
In practical applications however, the extraction and selection of signal parameters need to be optimised carefully. The large number of signal parameters results in so-called algorithm overfitting. In this case, high performance for the dataset used for fitting is not maintained when the algorithm is used in real conditions. For this reason, the cross-validation procedure is applied, where the dataset is split into k subsets. The algorithm is fitted on k−1 subsets and validated on the last one not used for fitting. The process is repeated k times every time with the new subset excluded. In this way the estimated mean accuracy reveals the problem of overfitting (an overfitted algorithm has worse accuracy than not overfitted).
The cross-validation helps to select only those signal parameters that handle useful information. The selection starts from one signal parameter and continuously increases the number of parameters one by one (forward selection) and after that starts to remove signal parameters (backward selection) up to a single one. The cross-validation helps to assess how addition or removal of signal parameters affects the indicator performance measured by specified single or multi-objective criterion. This procedure allows the extraction of many signal parameters from many time series of signal estimators and the selection of only the most important ones.
In the just presented example, a rather small dataset was used. In the following and with reference to
In a first step, a drift cancellation of the detected RIS was performed. To this end the detected RIS was split into 10 s intervals. Thereafter, a linear fitting of a polynomial of the 1st order to the RIS for 10 second intervals was applied to calculate the fitting error being the input for the calculation of the signal estimator, herein below of the variance (estimator of the 2nd moment of probability density function). In this way the signal estimator over time, i.e. a variance signal s(t) with 0.1 Hz sampling frequency is obtained.
In the frequency domain, the 85 seconds intervals are used for power spectrum estimation and fitting by mixture of white and second order band noise models with common equation (1),
where f means frequency, A, f0, and Q are parameters of the band noise, and B is a parameter of the white noise (a constant). This results in additional four signals (A(t), f0(t), Q(t) and B(t)) with sampling frequency around 0.01 Hz ( 1/85). Together with the variance signal s(t) these (time series of statistical estimators) are the input for the determination of the signal parameters.
In said determination of the signal parameters, the signal parameters are calculated. The signals s(t), A(t), f0(t), Q(t), B(t) are divided into intervals of 20 minutes length for the Bac phase and 30 minutes for the Drug phase, see
where mi, mj are medians within the ith and jth interval as defined in
For the variance signal in the drug phase, the linear and exponential fitting are also applied, and the fitted parameters are additional four signal parameters (the slope a, the intersection b, the growth/decay rate τ, and the magnitude c). Equation (3) defines the exact mathematical formulas,
where coefficients of determination of both fits and ratios c/T and b/a are among the set of extracted signal parameters. The illustration of linear and exponential fit is shown in
As a result, a set of 192 signal parameters has been determined or extracted and has been subjected to signal parameters selection stage, i.e., has been fed to the feature selection algorithm.
In the table depicted in
With reference to
That is, susceptible cells exposed to a toxin or antibiotic exhibit a decrease in viability resulting eventually in death. In the case of bacteria exposed to a bactericidal antibiotic, a kill rate can be measured by determining the change of the number of colony-forming units (CFUs) at different time points. A CFU is thereby defined as a bacterium (=a unit), that through multiple replication cycles forms a visible colony on growth permitting medium. The number of these CFUs is therefore a measure of the concentration of viable bacteria in the bacterial suspension of interest. Multiple samplings over time assess the change in CFU numbers of the surviving bacteria and therefore serve to calculate a kill rate. This method has been a very reliable standard technique for more than a century already applied by for instance Robert Koch.
However, one of the major drawbacks is its dependence on growth and thus the time a bacterium needs to form a visible colony for analysis. Fast growing bacteria like K. pneumoniae form colonies visible by eye in ten hours, slow growing pathogens like M. tuberculosis however take several weeks. In any case, it is a very reliable but slow method.
In the present example depicted in
However, and as follows from
With reference to
In the example depicted in
As mentioned earlier and as illustrated in
In order to structuralize the description of signal parameters the polish notation together with some formal structure of signal parameters names are used.
The signal parameters names have the following structure
(e.g f0_Bac__80-90_p50—means f0 signal estimator for which 50th percentile is calculated for bac phase for time interval from 80 minute to 90 minute). Statistics used are percentiles, means, standard deviation (std), and slope.
Polish notation is a way to code arithmetic expressions without using parentheses e.g.
/+a b−a b is equivalent to (a+b)/(a−b). It is used, to automatically code signal parameters being ratios and differences of other signal parameters.
The feature selection algorithm selected the following noise signal parameters as pareto optimal based on accuracy and number of signal parameters:
As follows from
The following embodiments explore three aspects of the time-dependency of the signal being evaluated by the evaluation device: the complex aspects of power spectral densities not covered by theoretical noise models, the related to the particle vibrations scarce events in the signal that might vanish during power spectral density estimation and self-affinity of cell vibrations.
In particular, a typical spectrum and signal estimators that could be derived from a signal being detected with a motion detector and being analysed with the method according to the invention are presented in
The normalised power normNi and normNEi for every i Ni/(N1+N2+N3+N4+N5) and NEi/(N1+N2+N3+N4+N5) These time series of signal estimators can be used to calculate signal parameters and/or their ratios and differences.
As follows from
KLPQ—Kullback-Leibler divergence between Chi2 probability density function and empirical probability density function calculated basing on percentiles,
KLQP—Kullback-Leibler divergence between empirical probability density function calculated basing on percentiles and Chi2 probability density function,
Kurtosis—is kurtosis (a moment) of empirical probability density function,
JS—is Jensen-Shannon distance between Chi2 probability density function and empirical probability density functions,
MO—(p60−p40)/(p90−p10) (ratios and differences)
EEPDF—is mean (a moment) of the empirical probability density function
They provide every 5 min samples of 112 time series of signal estimators from which signal parameters and their ratios and differences for different percentiles, frequency ranges and time intervals are calculated. They are objects of the feature selection algorithm.
The current findings about cell vibrations revealed that their power spectral density function has strong 1/f characteristic. This kind of spectrum is often connected with signal self-affinity. Self-affinity can be analysed by detrended fluctuations analysis (DFA) and its generalisation multifractal detrended fluctuations analysis (MF-DFA). In these methods the dependence of the signal statistical properties on the time scale is investigated. In the presented embodiments, the MF-DFA based signal parameters were used. In particular, the following 21 generalized mean exponents/powers are used: −10.0, −8.0, −6.0, −4.0, −2.0, −1.0, −0.8, −0.6, −0.4, −0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0. Together with 9 numbers of subintervals (4, 8, 16, 32, 64, 128, 256, 512, 1024) results in 189 signal parameters, from which the ratios and or differences are further calculated.
In
Analogous to the examples consisting of relatively small datasets depicted in
In
In this, True positives are considered correctly classified experiments with susceptible strains, True negatives are correctly classified experiments with resistant strains, False positives are experiments with falsely classified resistant strains, and False negatives refer to experiments with falsely classified susceptible strains.
Classification model no. 1 (classification model is an instance of a linking algorithm being a classification algorithm) is based on a single SP (F128) with arguably the biggest impact, leading to a classification accuracy of 85.8%. Model no. 2 with two SPs (F128, F61) achieved 89.7% accuracy, with three SPs (F128, F61, F146_F129) 91.4%, and finally, 93.1% accuracy was reached with model no. 4 combining four SPs (F128, F61, F146_F129, F7_F0)—summarised in
The score, i.e. the classification indicator for each of the 233 experiments is benchmarked against the reference MICs that are either considered susceptible (S REFERENCE) or resistant (R REFERENCE) in
In conclusion, one SP, that can be considered a single aspect of the nanomotion signal, is less suited than 4 SPs for describing the diverse response to the drug found in such a diverse dataset of 83 different strains. The increase of SPs and their combination significantly improves the performance of the classification algorithm.
In another example, an even bigger and more diverse dataset of 487 samples comprising 160 clinical isolates of two different bacterial species, E. coli and K. pneumoniae, were exposed to one concentration of ceftriaxone (indicated as “Conc.”,
The linking algorithm being classification algorithm with a high performance of separating experiments with resistant strains from experiments with susceptible strains is based on the combination of the four signal parameters described above resulting in a score, i.e., classification indicator that assumes positive values for experiments with predicted susceptibility (S INVENTION) and negative values with experiments of predicted resistance (R INVENTION). The accuracy reached 91.6%, sensitivity 89.0% and the specificity 94.6% (
A similar analysis of 210 samples of 127E. coli strains exposed to 8 μg/ml ciprofloxacin (CIP, indicated as “Conc.” in
The final linking algorithm being classification algorithm was based on four SPs and from the 210 samples it correctly classified 190. Thus, it achieved an accuracy of 90.5%, a sensitivity of 89.8% and a specificity of 91.2% (
Yet, in another analysis, 155 samples of 125 diverse E. coli strains from different strain collections were exposed to a third antibiotic cefotaxime (CTX). In the same nanomotion measurement setup as described in 7a and 7b, we used 32 μg/ml cefotaxime (indicated as “Conc.” in
The linking algorithm being classification algorithm based on five SPs led to only 11 false classifications, resulting in an accuracy of 92.9%, a sensitivity of 91.7% and a specificity of 94% (
In summary, for each of the four bacteria-drug combinations in
The two aforementioned drugs cefotaxime and ciprofloxacin impede different cellular processes of the bacterial cell. Cefotaxime belongs to the family of beta-lactam antibiotics interfering with the cell wall metabolism while ciprofloxacin as a quinolone binds topoisomerases involved in DNA folding, a process that impact replication and effectively all processes in which the DNA is involved. The proper functioning of both drug targets is essential. While for both, their impediment is in the long run detrimental for the cell, both their triggered cellular stress responses differ.
On a total dataset size of 404 experiments, of which 202 were performed with susceptible E. coli strains and 32 μg/ml cefotaxime and an equal number of experiments with susceptible E. coli strains and 8 μg/ml ciprofloxacin, the combination of nanomotion recordings and machine learning was applied to develop linking algorithms being classification algorithms to separate the information entailed in the nanomotion response to cefotaxime from ciprofloxacin. All 404 experiments were used simultaneously to develop the classification model. If an experiment was performed with cefotaxime and afterwards correctly predicted as such by the method, it was considered correctly classified. The same accounted for ciprofloxacin. The experimental setup was again identical to the one described for
Quantile signal parameters are used and linking algorithm being classification algorithm being a support vector machine algorithm with radial basis functions. The feature selection algorithm selected the following pareto optimal signal parameters on the basis of accuracy and number of signal parameters:
In a linking algorithm being classification algorithm the predicted susceptible response to ciprofloxacin was assigned negative score values, while the predicted response to cefotaxime was assigned positive score values (similar to a classification algorithm for R and S phenotypes). Wrongly classified experiments assumed positive values for ciprofloxacin and negative values for cefotaxime, accordingly. On a high-performing classification algorithm based on seven SPs the accuracy for both drugs reached 85.4%. 87.0% of the ciprofloxacin experiments were correctly classified and 83.7% for cefotaxime. The impact of every additional signal parameter is presented in
Besides chemical stressors, as shown for different drugs (
Comparing the nanomotions at two different temperatures and thus expectedly a rather global change in the cell's metabolic activity, the classification algorithm's accuracy in separating both conditions ranges from 97% with a single SP up to 99% for five SP. In
The doxorubicin susceptible SW480 was cultured under standard laboratory conditions (i.e., cell culture medium (Dulbecco's Modified Eagle Medium (DMEM) containing 10% heat-inactivated fetal calf serum (FCS), 37° C., 5% CO2) in cell culture flasks. In preparation of nanomotion experiments, cells were detached, collected in cell culture medium, and washed in DMEM containing 10% FCS. The cell suspension was used for the attachment to the cantilever, and all nanomotion measurements with the device were performed in DMEM containing 10% FCS. The colon cancer cells were attached to the cantilever. Three phases of signal recording were performed: (i) Blank phase, where the deflections of the bare cantilevers are measured, (ii) the Medium phase, where the deflections of the cantilever with attached SW480 were measured followed by (iii) the Drug phase, where the deflections of the cantilever with attached SW480 after exposure to a drug were recorded or a second Medium phase without doxorubicin. The recordings presented in the following were conducted in a motion detector installed in a CO2 supplied incubator to allow optimal culture conditions for cancer cells.
The signal parameters selected by the feature selection algorithm basing on pareto optimality, accuracy and number of signal parameters are as follows:
Indeed, the accuracy of pareto optimal linking algorithms ranges from 83.2% for one SP up to 91.6% for seven SPs. In
Besides qualitative assessments of metabolic states of a cell by using classification models, the impact of interference at a cell's metabolic activity by a drug can be quantified using regression models. We used the clinical E. coli isolate IMHA-2155385, which is susceptible to the drug combination ceftazidime-avibactam. This cephalosporin/beta-lactamase inhibitor combination attacks bacterial cell wall synthesis and simultaneously blocks the beta-lactamase-mediated resistance mechanism. In
The presented results confirm that the invention allows measuring the increased impact of metabolic activity related to increasing drug concentration.
Number | Date | Country | Kind |
---|---|---|---|
22162147.7 | Mar 2022 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2023/055596 | 3/6/2023 | WO |