The invention relates to a spectroscopic method for characterizing an agri-food product, in particular intended for determining the naturality, freshness and authenticity of such a product, or even the conformity of same with a target product. The invention also relates to a device for implementing such a method.
The method is based on chemometric methods, and in particular on multivariate or, preferably, multi-way statistical analysis of natural-fluorescence spectra. Multi-way analysis is the natural extension of multivariate analysis when the data are arranged in tables with three or more ways. In this respect, reference may be made to the reference work by R. Bro, “Multi-way Analysis in the Food Industry Models, Algorithms, and Applications”, PhD thesis, University of Amsterdam, 1998.
The “naturality” and the “freshness” of stored or transformed agri-food products—i.e. their proximity with respect to the initial fresh products—are important parameters both for consumers and for producers. Unfortunately, these parameters are difficult to define precisely, and even more difficult to quantify. The invention is in fact directed toward enabling such a quantification.
The method of the invention also makes it possible to evaluate the “authenticity” or the “standardization” of an agri-food product compared with a reference product (region of origin, land, manufacturing method, etc.) or “standard” product.
In general, the concepts of naturality (a), freshness (b), authenticity (c) or standardization (c′) denote the modifications of qualities of a product, revealed by the changes in its physicochemical properties compared with a natural product which is nontransformed (a), fresh without storage (b), authentic according to precise and recognized specifications (c), and standard according to specifications internal to the company which manufactures the product (c′).
The changes in physicochemical properties are themselves made explicit by the changes in fluorescence which reveal chemical composition modifications, through natural or neoformed fluorophores, or optical properties, such as absorbance (linked to the color), and scattering (linked to the macromolecular organization).
The calculation of the distance between reduced representations of the spectra, having F variables, constitutes means for quantifying the changes carried out during a technological transformation (a), during storage under specific conditions (b), or between various manufacturing modes (c, c′), which are more or less standard or authentic.
In any event, it is possible to consider a representative population of samples of the reference product, so as to take into account the inevitable variability on said product. Indeed, the differences in spectral signature between the product to be characterized and the reference product are significant only if they exceed those which can be attributed to the variability between samples of the reference product.
This method may constitute a standardization tool at the service of the food-processing industry, making it possible to compare any manufactured product with a target sample, judged to be optimal. It may also constitute a tool for characterizing a storage, transformation or production method; in this case, the focus is not on isolated samples, but on representative populations of such samples, resulting from the method to be characterized.
A method for characterizing one or more samples of an agri-food product, or more broadly any product subjected to a method of manufacture (cosmetic, medicament, etc.), transformation or storage according to the invention, is characterized in that it comprises:
a) illuminating said or each sample to be analyzed with a plurality of excitation light radiations at respective wavelengths;
b) acquiring natural-fluorescence spectra of said or of each sample, each corresponding to a respective excitation light radiation;
c) applying a multivariate or multi-way analysis method to said spectra, which provides a number F of variables representative of said or of each sample, such that said or each sample can be represented by a point in a space having F dimensions;
d) calculating a distance, in said space having F dimensions, between the point representing said or each sample and a target representing one or more reference samples; and
e) determining a characteristic of said or of each sample according to said distance.
In one particularly simple case, the distance calculated can itself be used to express the desired characteristic; in this case, steps d) and e) are carried out jointly.
According to various advantageous characteristics of the invention, taken in isolation or in combination:
Another subject of the invention is a device for spectroscopic analysis of at least one sample, comprising:
According to various advantageous characteristics of the invention, taken in isolation or in combination:
Other characteristics, details and advantages of the invention will emerge on reading the description given with reference to the appended drawings, provided by way of example and which represent, respectively:
The method according to one embodiment of the invention uses the natural-fluorescence signal emitted at the surface of the food after illumination with light beams having predetermined wavelengths in the UV-visible range (approximately: 250-750 nm). This signal is analyzed by chemometric methods which make it possible to extract the information that correlates with the “naturality” or “freshness” characteristics that it is desired to quantify. The existence of such correlation is deduced from the fact that, during the agricultural production, the storage and the transformation, the intrinsic fluorescence of the natural constituents of the food (vitamins, proteins and other constituents that are natural or intentionally or unintentionally added), and also their reflectance, change, while, at the same time, new signals appear owing to the formation of new molecules. The term “neoformed fluorescence” or “acquired fluorescence” is used depending on whether the fluorophores are formed de novo or originate from the environment. The joint change in the native signals (NS), neoformed signals (NFS) and newly acquired signals (NAS) correlates robustly with the physical, physicochemical, chemical or microbiological modifications of the food, in particular with the changes in the quality parameters, induced during the production, the storage and/or the transformation. The change factors which influence the quality of the food are oxidizing ultraviolet radiation, destruction of microorganisms or, on the contrary, the development of some of them which can lead to the synthesis of mycotoxins, human intervention on crops (fertilizers, pesticides, etc.), or the application of processes which modify the temperature, the pressure or any other physical parameter within the food and which consequently cause a modification of the physicochemical composition and of the quality parameters.
The excitation light radiations have wavelengths chosen so as to explore the UV-visible spectrum as widely as possible. In general, it is possible to choose, a priori:
These wavelengths may be modified according to the specific application. In any event, for greater accuracy, it is possible to choose the wavelengths that are as close as possible to the maxima of the loadings representing the excitation vector obtained by the PARAFAC decomposition of a complete EEM matrix obtained with a laboratory fluorimeter on a batch of representative samples. In general, studying the fluorescence spectra using a visible or ultraviolet excitation radiation provides more information on the transformations undergone by the agri-food products than studying the infrared spectra.
Preferably, the number of wavelengths used is between 2 and 6, advantageously between 3 and 5, and preferably equal to 5, making it possible to excite in turn a corresponding number of groups of fluorophores among those described above. The use of such a restricted number of wavelengths is advantageous for making it possible to implement the method of the invention in an industrial environment. It contrasts with the conventional chemometric techniques, based on the use of a large number of excitation wavelengths. On the other hand, it imposes additional constraints, as will be discussed below.
The intensity of the excitation radiation is chosen such that the fluorescence emission energy of these fluorophores is significantly modified during the transformation (pasteurization, sterilization, etc.) or during the storage of the product to be characterized.
The light sources may typically be light-emitting diodes, or even lasers (preferably semiconductor lasers) if greater intensities are required.
The analysis of the spectroscopic data, performed by the MTD computer, comprises four main steps:
The preprocessing of the spectra, with reference to a specific example, in which a sample of roasted chicory is illuminated successively with three excitation radiations at 280, 340 and 429 nm, is first considered. Each of the three fluorescence spectra consists of 1515 spectral intensity values for as many different wavelengths λ. The spectral resolution is 0.25 nm/pixel, but it can be divided by 5, or even more, without any degradation of the results of the method of the invention being observed.
The “raw” spectra (
For this reason, it is preferable to eliminate the contribution of the first-order Rayleigh scatter by means of an innovative technique which is based on the prediction of the region of scattering which overlaps the fluorescence via a generalized linear model (GLZ) with a log link function. In this respect, see:
The technique has already been described in French patent application Ser. No. 09/06088 filed on Dec. 16, 2009 in the name of the applicant and not published at the date of filing of the present application.
This model is calibrated on a region of the spectrum in which the contribution of the fluorescence is negligible, and the spectral intensity is attributed exclusively to the scattering (reference RD in
The generalized linear model has the form:
f(μy)=b0+b1x
In the equation, f(μy) is the link function of μy, the expected value of y, with y being the vector of the Rayleigh scattering intensities which do not overlap with the fluorescence, while x is a vector of indices (1, 2, 3, etc.) of the same size as y.
With the generalized model, nonlinear relationships (between x and y) can be modeled via the link function. The generalized model can be used to model dependent variables having distributions belonging to the exponential family (Normal, Gamma, Poisson, etc.). Multiple linear regression is a special case of the GLZ model which corresponds to a link function equal to the identity function and to a dependent variable (y) having a normal distribution.
The bi parameters of the GLZ model are estimated by the statistical method of maximization of likelihood (L):
L=F(Y,model)=Πp[yi,bi],
The objective is to find the parameters that give the greatest probability (joint density) of producing y for all the observations. An iterative estimation (Fisher algorithm, which is a quasi-Newtonian method) is used to find the bi parameters, while maximizing L:
Once the bi parameters have been estimated, the GLZ model is applied to the scattering indices corresponding to the spectral region superimposed with the fluorescence (FIG. 2B—region of prediction RP) in order to predict the intensities of the “pure” scattering. After this prediction, the complete scattering spectra (real and predicted parts) are subtracted from the EEMs to obtain the spectrum of pure fluorescence SF (FIG. 2C—to be compared with
In
The elimination of the Rayleigh scattering is particularly important when the analysis is based on front-face fluorescence spectra of dense and therefore turbid samples. However, the invention can also be implemented on dilute samples, which makes the preprocessing of the data less critical.
The successive operations no longer relate to the three fluorescence spectra considered individually, but to the concatenated spectra, i.e. spectra arranged one after another in a single column (see
The concatenated spectra can be subjected, as appropriate, to:
where j is the number of wavelengths of the concatenated emission spectrum (j=1 to 4545) and
or
In this respect, see the abovementioned publication by R. Bro and also the article by M. S. Dhanoa et al. “The link between multiplicative scatter correction (MSC) and standard normal variate (SNV) transformations of NIR spectra.” J. Near Infrared Spectrosc. 1994, 2, 43-47.
After having acquired and processed the fluorescence spectra of a certain number of calibration samples (at least two, corresponding to extreme values of the parameter to be quantified; preferably more), it is possible to determine the multi-way statistical model that will be used for the analysis of other samples to be characterized. This comprises the calculation of the “loading” vectors of this model.
The case of a trilinear model of “PARAFAC” type is first considered. The principle, shown in
where: “i” is the index of the samples, “j” is that of the excitation radiations, “k” is that of the emission wavelengths, and “f” is that of the F PARAFAC decomposition factors.
The concatenation of the spectra makes it possible to write the PARAFAC decomposition in matrix form:
X
I*JK
=A(CB)T;
where:
The loading vectors are calculated on the basis of the calibration samples chosen so as to be representative of the total variability expected in the group of samples to be subsequently characterized. For example, if it is desired to quantify the naturality of a product, calibration samples corresponding to fresh products (perfect naturality), others corresponding to highly transformed products (low naturality), and preferably even others corresponding to intermediate naturality values, will be used. The same is true for the other parameters to be quantified. It should be considered that the model is empirical; it is therefore valid only for samples similar to those that were used for the calibration.
The parameters A, B and C of the PARAFAC model can be calculated by the method of alternating least squares (nonlinear iterative method). In this method, a first estimation of the matrix A is calculated conditionally on initial random values assigned to B and C in order to minimize the sum of the squares of the residues. The parameter B is then updated using the estimation of A, and then the parameter C is updated using the new value of B, and so on. Each iterative updating of A, B and C therefore improves the solution (reduction in the error surface). The algorithm converges when the improvement in the solution at the level of an iteration becomes very small (by default, this criterion is 10−6).
The convergence of the PARAFAC model for data that are weakly resolved and that, furthermore, are not truly trilinear, may be problematic (see the above-mentioned publications by R. Bro and Rizkallah). Indeed, the error surface of the model may contain local minima (points lower than in their vicinity, but higher than the true minimum of the surface), saddle points, flat areas, and very narrow valleys which can slow down or even prevent the convergence of the algorithm.
The present inventors have noted that, by imposing a limitation on the number of iterations (for example, a maximum of 30 iterations) and/or by increasing the convergence criterion (10−2 or 10−2 instead of 10−6), the model is significantly improved at the level of the parameters (loadings and scores). Indeed, an excessive number of iterations can degrade the relevance of the model.
The convergence can be facilitated by imposing a constraint of non-negativity, since it is known, a priori, that the fluorescence spectra and also their relative intensities cannot take negative values.
Finally, it has been noted that the preprocessing for elimination of the Rayleigh scattering by applying a GLZ model facilitates the convergence of the iterative method for calculating the loading and score matrices of the PARAFAC model.
In the case of perfectly trilinear data, the PARAFAC model in theory allows only a single solution. However, in the case of an EEM limited to three excitations, and when the fluorescence is collected at the surface, the trilinearity of the EEM matrix is greatly disrupted. As a result, the solution is no longer a single solution and it is necessary to develop criteria for selecting the number of factors and also the final model.
This choice is guided by several criteria known per se (see the abovementioned reference work by R. Bro):
The PARAFAC model thus constructed is applied to the EEM of each new sample to be characterized or, more generally, to the EEMs of several samples to be characterized at the same time, assembled so as to form a data cube.
The new scores are calculated on the basis of the “loading” matrices B and C of the model, as follows:
A
new=(BC)+*Xnew;
where the superscript + indicates the generalized inverse of the tensor product and Xnew indicates the concatenated and preprocessed spectral data of the new samples.
PARAFAC decomposition can be considered to be a technique for dimensional reduction of spectroscopic data. Before decomposition, each sample is identified by 3×1515=4545 spectral intensity values; it can therefore be represented by a point in a space with 4545 dimensions. Afterwards, each sample is identified by the F values of the scores: it is therefore represented by a point (reference PE in
Other methods of multivariate or multi-way analysis can be used in place of the PARAFAC decomposition. By way of nonlimiting example, mention may be made of principal component analysis, or PCA; principal component regression (PCR); partial least squares (PLS) regression or, better still, partial least squares discriminate analysis (PLS-DA). The PLS-DA method has the advantage of taking into consideration the prior knowledge of the various groups of samples having undergone similar treatments, thereby making it possible to optimize the separation thereof. In the case of the principal component analysis or regression, the coordinates of the points PE will not be given by “scores”, but by “principal components”.
Whatever the method of statistical analysis used, the reference samples form a cloud of points in this space of reduced dimensions. This cloud of points makes it possible to determine a “target”, having in particular the form of an interval (with 1 dimension), circle (with 2 dimensions), sphere (with 3 dimensions) or hypersphere (with more than 3 dimensions).
For each point PE, corresponding to a sample to be characterized, it is possible to determine a distance D from the target (distance measured as a function of the center of the target or of its periphery, according to the embodiment considered). The distance D may be the ordinary Euclidean distance; however, in the case of a PARAFAC factorization, it is preferable to use a Mahalanobis distance, which takes into account the non-perfect independence of the “coordinates” defined by the PARAFAC factors. In any event, it is this distance which makes it possible to quantify the freshness and/or the naturality of the sample.
As a variant, it is possible to define the naturality (or the freshness, the authenticity, etc.) by means of a statistical test such as a Student's test. The procedure is carried out in the following way.
Several duplicates of the front-face fluorescence EEMs of a sample to be characterized and of a reference sample are acquired and arranged in a three-way cube which is decomposed using a PARAFAC model which is written, in matrix form:
X
I*JK
=A(CB)T+EI*JK;
where B and C are the excitation “loading” matrices, A is the score matrix and EI*JK is the residue matrix.
A distance indicator vector y is then calculated using a linear regression model:
y=b
1
a
1
+b
2
a
2
+ . . . +b
F
a
F
+e.
The vector y is binary and takes values of zero for reference samples and of one for the sample to be characterized. The vectors a are the columns of the matrix A, the scalars bi are the coefficients of the linear model, e is the residue vector and F is the number of PARAFAC factors that is required in order to satisfactorily explain the data. A Student's t test is applied to the distances {circle around (y)}i expected on the basis of the EEMs of the reference sample and distances ŷs expected on the basis of the EEMs of the sample to be characterized. The statistic t is calculated via the following equation:
where
where Γ is the gamma function and ν is the degree of freedom
Samples having a higher external probability, on the basis of the null hypothesis on the distributions (Ho
The spectra are preprocessed by elimination of the Rayleigh scattering by application of a GLZ model, concatenated, and standardized with SNV methods; then a PCA model is constructed from the preprocessed spectra, and the principal component demonstrating most clearly the difference between the various groups of samples, corresponding to various treatments undergone by the milk, was chosen (in this case, it is the 1st principal component, explaining approximately 65% of the variability of the samples). Descriptive statistics (minimum and maximum values, medians, values of the 1st and 3rd quartile, distance between the centroids of the groups) of the various groups are calculated on the basis of the scores of this principal component. These statistics make it possible to determine whether the groups are actually separated, i.e. whether the distance between their medians is significant.
In
The scale on the left of the graph of
Fluorescence spectra were acquired and preprocessed in the same way as for the previous example. A principal component analysis was performed, and the first three principal components PC1, PC2 and PC3 were taken into consideration. The 1st principal component PC1 explains 65.83% of the variability of the samples; the 2nd principal component PC2 explains 22.69% of the variability and the 3rd component PC3 only 5.08%.
In this
The authenticity and/or the conformity of a product are binary parameters: the product is authentic or it is not; it is in conformity or it is not. These parameters can be determined by comparing the distance D of the sample from the centroid of the target to a threshold. For example, this threshold can be equal to RCI: the sample is considered to be authentic/in conformity if its representative point PE is inside the target, adulterated/not in conformity if it is outside the target.
Number | Date | Country | Kind |
---|---|---|---|
1002549 | Jun 2010 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB11/52600 | 6/15/2011 | WO | 00 | 1/16/2013 |