The present invention relates to fitting metabolite spectrum data to a model of metabolite data. The present invention also relates to generating a report and personalised advice based on comparison of fitted values with model references and client data.
So-called western dietary patterns (that is, high in saturated fat, cholesterol, sodium, and added sugars; low in fruits, vegetables, and fibre) increase the risk of obesity and many non-communicable diseases, including diabetes, coronary heart disease, and cancers. Overall dietary patterns might be more informative about non-communicable disease risk than individual foods or nutrients. Many governments have introduced population-based policies aiming to improve dietary patterns and reduce disease burden. These policies have a common core goal (reflected in the WHO Global Strategy on Diet, Physical Activity and Health) of decreasing added sugar, sodium, and total fat consumption, and increasing intakes of wholegrain cereals, fruits, vegetables, and fibre. Results from the North Karelia project showed that such dietary change can contribute to decreased coronary heart disease mortality at the population level.
A major limitation of nutritional science is the objective assessment of dietary intake in free-living populations. Monitoring of dietary change in national surveys and large prospective studies relies on self-reported food intake using instruments such as food frequency questionnaires, dietary recall, and diet diaries; the prevalence of misreporting with these tools is estimated at 30-88%. Compounding this problem, bias in dietary misreporting (with under-reporting biased towards unhealthy foods and over-reporting towards fruits and vegetables) contributes to data inaccuracy and misinterpretation. Moreover, under-reporting of dietary energy intake is particularly common in obese individuals, which is a major concern considering the increasing prevalence of obesity worldwide.
According to a first aspect of the present invention, there is provided a method of fitting spectrum data to a model of biological substance data. The method comprises receiving spectrum data, receiving fitting data for each of a plurality of biological substances, wherein fitting data comprises, for each of the plurality of biological substances: a number of reference multiplets for that biological substance, and for each reference multiplet, the position of the centre of that reference multiplet, the number of peaks for that reference multiplet, the relative amplitude of each peak, and the width of each peak. The method further comprises determining a fitting order of the reference multiplets, wherein the position of each reference multiplet in the fitting order is based on the number of possible overlaps with other reference multiplets comprised in the fitting data, starting with the fewest overlaps and ending with the most. The method further comprises, for each reference multiplet, according to the fitting order: performing a first grid search to identify one or more first correlations between the reference multiplet and the spectrum data, wherein the grid search uses a first interval size, performing a second grid search on a range of wavelengths encompassing the one or more first correlations and using a second interval size smaller than the first interval size, wherein the second grid search identifies one or more second correlations, determining the second correlation corresponding to the best match between the reference multiplet and the spectrum data, in dependence upon the best match exceeds a detection threshold: assigning the biological substance corresponding to that reference multiplet as present, determining a concentration of that biological substance based on the portion of the spectrum data corresponding to the best matched reference multiplet, based on the determining concentration, generating a synthetic spectrum corresponding to the concentration of that biological substance; subtracting the synthetic spectrum from the spectrum data, removing all the reference multiplets for that biological substance from the fitting order, and updating the fitting order of the reference multiplets using the remaining reference multiplets.
The number of first/second correlations not necessarily equal.
A multiplet may have one or more peaks.
Known biological substances with distinct spectrum patterns, for example, urea in urine data, may be identified and subtracted from the spectrum before further analysis.
The biological substance fitting data may also include hyperparameter data. Hyperparameter data may include the number of intervals between peaks used in either the first or second grid search, the number of iterations applied when performing the first or second grid search.
Reference multiplets with the same number of overlaps may be further ordered by degree of overlap.
Overlap may be in wavelength or equivalents, for example, frequency, wavenumber chemical shift, amplitude, magnitude or other distinguishing metric.
Reference multiplets with the same degree of overlap may be further ordered by relative amplitude for a standard concentration.
Degree of concentration may be, for example, 1 millimol/l, or 1 nanomol/l.
The method may further comprise iteratively performing the first and/or second grid search.
The method further comprise normalising the spectrum data to the model of biological substance data.
Normalising the spectrum data to the model of biological substance data may comprise performing one or more amplitude multiplications to at least a portion the spectrum data. The at least a portion of the spectrum data may be portion corresponding to the best match correlation.
The detection threshold for the best match between the reference multiplet and the spectrum data may be six sigma.
The biological substance spectrum data (28) may be nuclear magnetic resonance spectrum data.
The first grid search between the biological substance fitting data and the spectrum data may be performed using a series of chemical shifts as centres of the muliplets.
The biological substance spectrum data is from a urine sample. The urine sample may be a 24 hour urine sample. The urine sample may be a spot urine sample.
The method may further comprise performing a baseline correction of all of at least part of the spectrum data.
The biological substance spectrum data may comprise data from biological substances from food.
Determining a fitting order of the reference multiplets may be based on the number of peaks in the reference multiplet.
For example, those multiplets with the greatest number of peaks may be fitted before those with a fewer number of peaks.
The biological substance may be a metabolite. A metabolite may be an intermediate or end product of a metabolic process.
The method may further comprise performing a baseline correction of all of at least part of the spectrum data. The baseline correction may be performed around the peaks representing each biological substance, or around each multiplet. The baseline correction may be performed using a convex hull.
The biological substance spectrum data may comprise data from biological substances from food.
The biological substance spectrum data may comprise data from biological substances from drugs, for example, from prescription drugs.
The method may be a computer implemented method.
The relative amplitude of each peak of the multiplet may be expressed as normalised amplitudes, where the highest peak of that multiplet is recoded as 1 and the heights of remaining peaks, if any, are expressed as a proportion.
According to a second aspect of the invention, there is provided a method of analysing biological sample data, the method comprising: receiving a biological sample, sample collection data including at least sample date and time which are associated with a unique sample identifier. The method further comprises storing sample collection data, sample date and time on a secure server, generating biological substance spectrum data from the biological sample, performing the method of the first aspect of the invention, identifying a model to apply to biological substance spectrum data based on sample collection date and/or time, standardising biological substance spectrum data axis to the number of data points used by the model, applying the model to biological substance spectrum data, comparing the fitted values of the spectrum data with the model references, obtaining adherence to a nutritional health score guidelines, generating figures for report based on the nutritional health score guidelines and the outcome of the application of the model, and generating a report and personalised advice based on comparison of fitted values with model references and client data.
The client data may be encrypted.
The method may further comprise generating a unique sample identifier; and sending a biological sample collection kit to a client associated with the unique sample identifier.
The biological substance spectrum data is nuclear magnetic resonance spectrum data.
The guidelines may be World Health Organisation healthy eating guidelines.
According to a third aspect of the invention, there is provided a method of obtaining the percentage adherence of biological substance spectrum data to a model. The method comprises receiving biological substance spectrum data (28), and sample collection time (24), receiving a model based on sample collection time, the model comprising a plurality of sub-models, for each sub-model: centring and scaling spectrum data based on model and sub-model parameters, multiplying the biological substance spectrum data by sub-model coefficients for each sub-model of the model, generating distribution of percentiles of predicted adherence, calculating the probability for each value of predicted adherence, calculating the median value of predicted adherence from the distribution of probabilities.
Sample collection time may include sample collection date.
The distribution of percentiles of predicted adherence may be between 0 and 100%.
The biological substance spectrum data may be from a urine sample.
The biological substance spectrum data may be from a urine sample.
According to a fourth aspect of the invention, there is provided a method of generating a model from biological substance spectrum data. The method comprises importing biological substance spectrum data and model parameters, applying repeated measures scaling to biological substance spectrum data, calculating a model by performing the following steps n number of times: allocating biological substance spectrum data to training, optimisation and test sets, obtaining scaling parameters and applying scaling parameters to training, optimisation and test data sets, calculating models having one or more different hyperparameters on the training data set, selecting optimal hyperparameters using the optimisation set, applying the/a set of model coefficients to the test data, obtaining estimate of predictive ability for current iteration, storing training set and test set for current iteration, calculating overall measure of predictive ability across all iterations, and outputting model parameters for all iterations.
The model parameters may be user-specified.
The user-specified parameters may comprise at least one from the list of: the type of scaling, number of iterations, the part of the data that will be split into a test portion, a different level of alpha.
According to a fifth aspect of the invention, there is provided a computer program which comprises instructions for performing a method according to any previous aspect.
According to a sixth aspect of the invention, there is provided a computer readable medium which stores a computer program according to the fifth aspect.
According to a seventh aspect of the invention, there is provided a computer system comprising: memory; at least one processing unit; wherein the processor is configured to perform the method of any previous aspect.
Referring to
The workstation 2 includes non-volatile memory 8, memory 9 and a processor 10. The non-volatile memory includes application software 11. The secure sever 3 includes memory 15, a processor 16, and non-volatile memory 17. The memory 9 may include fitting data (not shown) and hyperparameter data (not shown). The fitting data may include fitting data for each of a plurality of biological substances, wherein fitting data comprises, for each of the plurality of biological substances: a number of reference multiplets for that biological substance, and for each reference multiplet, the position of the centre of that reference multiplet, the number of peaks for that reference multiplet, the relative amplitude of each peak, and the width of each peak.
A user 18 (also referred to as a client), who wishes to have their diet analysed May request a sample collection kit 19. The sample collection kit may contain instructions for sample collection and a BD Vacutainer complete urine collection kits, including a complete system for urine collection, with a collection cup, one evacuated tube and a towel or towelette for patient cleansing prior to collection (see https://www.bd.com/en-us/offerings/capabilities/specimen-collection/urine-specimen-collection/bd-vacutainer-collection-and-transfer-products/bd-vacutainer-complete-urine-collection-kits for details of the kit contents). The sample collection kit 19 allows the user 18 to collect samples 20 from their body. The sample collection kit 19 may also allow for the safe storage and transport of the sample 20. The sample 20 may be any suitable sample 20 which can be used to assess the diet of the user, for example, a urine sample (for example, a spot urine sample or a 24-hour urine sample), a faecal sample, a blood or a saliva sample. Once a user 18 has collected a sample 20 using the sample collection kit 19, the user enters sample collection data 4 to the secure server 3 using an interface (not shown) such as a web page or a smartphone app. The sample collection data 4 includes a sample identification number 22 (also referred to as “sample ID”, or “sample identifier”), the date 23 and time 24 of the sample 20 collection. The sample ID 22 is included in the sample collection kit 19 to allow for the identification of the sample 20 and user 18. The sample ID 22 may contain information about the user 18, the type of analysis to be performed and the type of sample to be taken.
The user 18 sends the sample 20 to an analyser 27. The analyser 27 analyses the sample 20 and produces raw spectrum data 28 of the sample 20. The analyser 27 may be any suitable spectrometer, for example, a nuclear magnetic resonance spectrometer, for example, a 600 MHz Nuclear magnetic resonance spectrometer. The raw spectrum data 28 is then sent to the workstation 2 where it is stored in the non-volatile memory 8. The workstation 2 receives the sample collection data 4 from the secure server 3. The processor 10 processes the raw spectrum data 28 and the sample collection data 4 using the application software 11 to produce the report 5. The report 5 may then be sent to the secure server 3.
Referring also to
Referring to
Referring to
The sample identifier 22 is decoded to obtain the user information and the sample information (step S5), also referred to as metadata 4. For example the decoded sample ID may contain user information such as name, sex, age, etc. and the sample information may include sample number, collection date and time. The metadata 4 for the sample 20 is stored in the non-volatile memory 17 of the secure sever 3 (step S6).
After step S3, the sample 20 is received form the user 18 for analysis (step S7). The sample 20 is then analysed to obtain spectrum data (step S8). For example, the sample 20 may be analysed for the presence and/or concentration of biological substances, e.g. metabolites, present in the sample 20. Metabolites may be intermediate or end products of metabolic reactions occur within biological cells. Metabolites may be low molecular weight organic compounds within a mass range of 50-1500 Daltons. The spectrum analysis may be nuclear magnetic resonance (NMR) spectroscopic analysis. The spectrum analysis may be mass spectrometry (e.g. with possible chromatographic separation by liquid chromatography, gas chromatography or capillary electrophoresis), or Raman spectroscopy. The spectrum data is then transferred to the workstation 2 (step S9). The raw spectrum data 28 may then be imported using the application software 11 (step S10). The raw spectrum data 28 is then corrected, for example using a baseline correction (step S11). As will be explained in more detail later, the raw spectrum 28 is then processed to fit the peaks of the peaks of known spectrum data (step S12). The fitted spectrum data is then calibrated to an internal standard, for example, normalized to an internal standard (step S13). The processed spectral data, for example processed metabolite spectral data, is then standardised (step S14), for example, if the spectrum data is NMR spectroscopy data, the chemical shift axis is standardised (using 1D cubic spline interpolation) to the number of data points used by one or more models applied later in the process. For example, there may be 16,000 points used by these models. For mass spectrometry, the peaks need to be aligned with the reference/model data so that the data are comparable, for Raman spectroscopy similar to NMR data processing, the spectrum is interpolated to the same number of points as the model data.
Using the metadata 4 and the standardised spectrum data, a model is identified using the time the sample was taken (step S15). If the sample was taken from 9 am-1 pm=model 1 (cumulative sample for after breakfast to before lunch), if from 1 pm-6 pm model 2 (cumulative model for after lunch to before dinner), if from 6 μm to 8 am=model 3 (cumulative model for after dinner, overnight and to before breakfast). The selected model is applied to the processed spectrum data (step S16). As will be explained in more detail later, the adherence of the processed spectrum data to the nutritional health guidelines is obtained (step S17). Optionally, pictorial representations of the adherence such as diagrams, figures, charts and plots are generated (step 18).
Once the processed spectrum data, for example processed metabolite spectrum data, is standardised in step S14, individual biological substances are fitted to a known spectrum of the biological substance data under investigation (step 19). For example, if the spectrum data is metabolite spectrum data, the individual metabolites are fitted to a known spectrum of the metabolite data. The fitted values of the processed spectrum data are then compared with the model reference values and a difference is obtained (step S20).
Using the stored metadata 4, the comparison between the fitted values of the processed spectrum data and the model reference values (step S20), and the (optional) pictorial representations of the adherence of the processed spectrum data to the model (step S18), a report 5 is generated. The report 5 may include personalised dietary advice for the user 18 (step S21). The report 5 is sent to the secure server 3 (step S22) and the user 18 given access to allow them to access the report 5 form the secure server 3 via an interface such as a webpage or smartphone app. The raw and processed data are stored on the secure server 3 in the non-volatile memory 17.
Referring to
The biological substance fitting data comprises, for each of the plurality of biological substances: a number of reference multiplets for that biological substance, and for each reference multiplet, the position of the centre of that reference multiplet, the number of peaks for that reference multiplet, the relative amplitude of each peak, and the width of each peak.
A fitting order of the reference multiplets is determined (step S33). The position of each reference multiplet in the fitting order is based on the number of possible overlaps with other reference multiplets comprised in the fitting data, starting with the fewest overlaps and ending with the most.
Known biological substances with distinct spectrum patterns, for example, urea in urine data, may be identified and subtracted from the spectrum at any stage of the process and therefore not included for further analysis.
Reference multiplets having the same number of overlaps may be further ordered by degree of overlap, for example, of neighbouring multiplets. Overlap may be in wavelength or equivalents, for example, frequency, wavenumber chemical shift, amplitude, magnitude or other distinguishing metric. Reference multiplets with the same degree of overlap may be further ordered by relative amplitude for a standard concentration. Degree of concentration may be, for example, 1 millimol/l, 1 nanomol/l.
For each reference multiplet, according to the fitting order (step S34), a first grid search is performed (step S35) to identify one or more first correlations between the reference multiplet of the fitting data and the spectrum data from the sample. The grid search uses a first interval size to identify the correlations. Optionally, the first grid search may use more than one interval size, for example, in an iterative way.
A second grid search is then performed (step S36) on a range of wavelengths encompassing the one or more first correlations and using a second interval size smaller than the first interval size. The second grid search identifies one or more second correlations. The second correlation is determined corresponding to the best match between the reference multiplet and the spectrum data (step S37). The number of first and second correlations not necessarily equal.
The first and second grid searches may be performed iteratively.
The spectrum data may be normalised to the model of biological substance data. Normalising the spectrum data to the model of biological substance data may comprise performing one or more amplitude multiplications to at least a portion the spectrum data. The at least a portion of the spectrum data may be portion corresponding to the best match correlation.
The biological substance fitting data may also include hyperparameter data. Hyperparameter data may include the number of intervals between peaks used in either the first or second grid search, the number of iterations applied when performing the first or second grid search.
If the best match exceeds a detection threshold (step S38), the biological substance corresponding to that reference multiplet is assigned as present (step S39). If the best match does not exceed a detection threshold, then the process returns to before step S33. The detection threshold can be predetermined or calibrated, based on known values and an individual user's 18 biochemistry. The best match have a correlation significance threshold and be significant after multiple testing corrections using, for example, a Hommel's correction. Other multiple testing corrections may be applied.
The detection threshold for the best match between the reference multiplet and the spectrum data may be six sigma.
A concentration of that biological substance is determined based on the portion of the spectrum data corresponding to the best matched reference multiplet (step S40). The concentration may be determined by, for example, integration, but any suitable method may be used.
Based on the concentration determined, a synthetic spectrum corresponding to the concentration of that biological substance is generated (step S41). This synthetic spectrum is then subtracted from the spectrum data (step S42) so that the spectrum data no longer shows that biological substance as present. Next, all the reference multiplets for that biological substance from the fitting order are removed (step S43). If all substances are fitted, then the process ends (step S44). If there are biological substances remaining to be fitted, the fitting order of the reference multiplets using the remaining reference multiplets is updated (S45).
The biological substance spectrum data 28 may be nuclear magnetic resonance spectrum data. If the biological substance spectrum data 28 is nuclear magnetic resonance spectrum data, the first grid search between the biological substance fitting data and the spectrum data is performed using a series of chemical shifts as centres of the muliplets.
The biological substance spectrum data may be from a urine sample. The urine sample may be a 24-hour urine sample. The urine sample may be a spot urine sample. The biological substance spectrum data may comprise data from biological substances from food. The biological substance spectrum data may comprise data from biological substances from drugs, for example, from prescription drugs. The biological substance may be a metabolite. A metabolite may be an intermediate or end product of a metabolic process.
Alternatively, the biological substances present in the fitting data may be ordered according to decreasing complexity of combined multiplets, that is, the biological substances with the most complex multiplets (e.g. number of peaks in the multiplet) are ordered first. The biological substances present in the fitting data may be ordered according to the number of peaks in the reference multiplet. For example, those multiplets with the greatest number of peaks may be fitted before those with a fewer number of peaks.
Alternatively, the biological substances present in the fitting data may be ordered in the following way. First, biological substances in high concentrations in the sample 20 that do not overlap with other biological substances in the region where their multiplets appear. Second, biological substances in high concentrations that always appear in urine of which clear signals are always observable from a spectrum, for example, an NMR spectrum (within the region where we expect to see these clear peaks based on (potential) variability of the chemical shift). Third, the remaining biological substances that are important to the model used for fitting. To determine this set of biological substances, each biological substance is evaluated to determine which of these have clear multiplets in regions of the spectrum (e.g. NMR spectrum) with no overlap with other compounds. The stability of the positions (e.g. the stability of the chemical shift positions if using NMR spectrums) of the biological substances on the spectrum are also taken into account when determining this set. The order in which biological substances are fitted may be dynamic, for example, the order may be updated after a particular biological substance has been fitted and then eliminated from the data, leaving biological substances which are more easily fitted to the model.
If a biological substance has a multiplet that can be easily identified in the spectrum data (e.g. urea) all of its signals can be fitted. This could be at the beginning of the fitting process (where there is no overlap) or after peaks from other biological substances are fitted and removed from the data.
For example, there are biological substances whose peaks are always exactly at the same chemical shift in NMR spectrum data, which can be readily fitted. Other biological substances, e.g. citrate or 3-methylhistidine, tend to have variability in their peaks where they appear, e.g. where for creatinine this is ppm+0.01 this could be ppm+0.15 for 3-methylhistidine. In this case, there is higher potential for these biological substances (metabolites) to appear in a larger region, hence more potential overlap with other biological substances. These biological substances would be fitted later when they cannot be ‘confused’ with other compounds. These ordering methods may be performed instead of, or in combination with each other, depending on the type of biological substance data to be fitted.
It is also possible to start with metabolite that is most well defined in the fitting data, for example the least amount of overlap between its peaks and other metabolite's peaks. Likewise, metabolites that are known to always be present in urine samples and visible in NMR spectral data may be fitted first. For example, urea and creatinine are often present and may be fitted first, whereas paracetamol metabolites are only present if the person took paracetamol, hence these signals are only fitted when other metabolites more commonly found are fitted first. Metabolites which are deemed more important for a particular model over other metabolites may be prioritised over others which may exist in the sample but have not been identified in the particular model, for example arginine. Arginine is well defined, but may not be considered important in our model, hence arginine may be fitted at a later stage.
During the first grid search, local optima of the correlations are identified and evaluated, and then for each of these local optima, a second grid search is performed. The second grid search may be at smaller intervals and more intervals. The set of multiplets being evaluated (having one centre for each, one amplitude applied to all) that best fits the data is chosen based on it being at most six standard deviations of noise higher than the peak. This may be applied to all peaks in all multiplets. The sets of parameters are chosen that best fits the data where the amplitude is greater than zero, except when no positive correlations are found. If no positive correlations are found, the fit amplitude is zero and no fit found.
In other words, local optima of correlations between the biological substance fitting data and the spectrum data are identified by performing a first grid search using a number of intervals between two peaks of the biological substance fitting data as a hyperparameter. Next, a subset of correlations between the biological substance fitting data and the spectrum data are identified by performing a second grid search on these local optima using a greater number of intervals between two peaks of the biological substance fitting data than were used in the first grid search as a hyperparameter.
After performing these steps, a few potential fits per multiplet are found and stored (step S36). Each of these fits is evaluated see which combination best fits the data (step 37). This can be done by applying an amplitude multiplication to each set. This multiplication allows us to see if this combination of multiplets (that correlate locally) have the correct ratios expected from the peaks of the biological substance and if this gets close to the actual spectrum. For example, the amplitude found by the standard spectrum may be multiplied (e.g. see
In other words a single correlation from the subset of correlations is selected by applying an amplitude multiplication to each of the spectrum data in the subset of correlations and comparing the ratios between at least first and second peaks in the multiplet of the biological substance spectrum data with the corresponding peaks in the biological substance fitting data.
The concentration of the biological substance in the spectrum data is then determined by integrating the multiplet of the biological substance spectrum data with the highest relative amplitude (step S38). The fit for the biological substance concerned is saved or stored (step S39). The identified multiplets of the spectrum data are eliminated from further processing by subtracting the identified multiplets from the spectrum data, allowing the remaining multiplets to be fitted more easily. The process is repeated until all peaks from all biological substances are fitted. The fitted values and spectrum location, and relative amplitude of the biological substance are then output.
The method of fitting the peaks may be expressed as:
Where a=amplitude, γ=gamma, xo=center of multiplet, xoδ=difference of peak to center, x=evaluate at this (ppm) value and f (i)=fit at index i in x. This may be performed for all multiplets, and for all peaks in each multiplet.
The method may be a computer implemented method.
Referring to
The user-specified model parameters may include the multiple testing correction type (types of false discovery rate (FDR) or family-wise error rate (FWER)) and significance level (also known as the alpha level), the maximum number of components the model will attempt to evaluate, whether or not the data will be corrected for orthogonal signals (for example, for repeated measures data). Further user-specified model parameters may include the number of bootstraps performed on the training model with optimal parameters chosen to find the spread of coefficients in the iteration. For example, 25 bootstraps may provide enough data and allow the data to be saved efficiently (for each model of 1,000 iterations, there are then 25 additional models so 25,000 in total, and across these 25,000 the variance is calculated of coefficients).
Repeated measures scaling is applied to the spectrum data (step S53). To apply the repeated measures scaling, data belonging to each individual is centred on the individual's mean spectrum. This is performed for each individual independently. Splitting of the data is performed per person (user 18) and not per sample, therefore, all samples from the same person (user 18) are always in the same set (that is, all in training set, or all in optimisation set, or all in the test set).
Next, the model is calculated by iteratively performing the following steps. The biological substance spectrum data is allocated to one of either training, test or optimisation. (step S54). The imported scaling parameters are applied to the training set (step S55). Next, the scaling parameters are applied to the optimisation set and the test set (step S56). A variety of models are then calculated using different hyperparameters on the training set of data (step S57). The hyperparameter used at this step may be the number of components in a partial least squares (PLS) model, however, other models may be used, for example, ridge regression in which case the hyperparameter that needs to be optimised is lambda (for regularisation). The optimisation set of data is then used to select the optimal hyperparameters to use (step S58). The coefficients calculated in step S57 are then applied to the test data (step 59). Performing this application of coefficients allows an estimate of the predictive ability for the current iteration to be obtained (step S60). The model (that is the training set) and the predictive values (the test set) are saved for the current iteration (step S61). The steps S54 to S61 are then repeated for a user-specified number of iterations (step S62).
When the number of iterations specified has been completed, the overall measure of predictive ability across all iterations is calculated (step S63) and the model parameters (for example, the scaling parameters and the coefficients) are outputted.
Known biological substances with distinct spectrum patterns, for example, urea in urine data, may be identified and subtracted from the spectrum before further analysis.
Referring to
Referring to
Referring to
Referring to
Referring to
It will be appreciated that various modifications may be made to the embodiments hereinbefore described. Such modifications may involve equivalent and other features which are already known in the methods of biological substance and metabolite analysis and component parts thereof and which may be used instead of or in addition to features already described herein. Features of one embodiment may be replaced or supplemented by features of another embodiment.
Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel features or any novel combination of features disclosed herein either explicitly or implicitly or any generalization thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
Number | Date | Country | Kind |
---|---|---|---|
2111739.5 | Aug 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2022/052116 | 8/12/2022 | WO |