Benefit and priority under 35 U.S.C. § 119(a) is hereby claimed to Chinese Patent Application No. CN 202210881518.5 filed on Jul. 26, 2022, which is hereby incorporated by reference herein in the entirety.
The present disclosure belongs to a technical field of element detection, and in particular, relates to a method for XRF quantitative analysis of heavy metal elements based on LLE-SVR.
At present, toxic elements such as, cadmium, mercury, lead, arsenic, zinc, chromium, nickel, copper commonly exist in contaminated soil, which are difficult to be removed by soil microorganisms. These contaminants are easily transferred to agricultural products such as rice, corns, thereby entering to human food supply chain and resulting in food poisoning or even carcinogenic risk. The heavy metal pollution has seriously affected human life, and its characteristics of being difficult to degrade, easy to accumulate and high toxicity have a significant influence on the growth, yield and quality of crops. Therefore, the first step of treatment and remediation is to determine the content of various heavy metals in contaminated soil.
ED-XRF, one of the most commonly used instruments for elemental analysis, allows for nondestructive analysis and study of the composition of substances by using X-ray to excite the substances to produce X-ray fluorescence without destroying the sample. It has the advantages of fast analysis, high precision and good reproducibility, and has important application value in alloy detection, environmental protection and safety, explosives detection, medical testing, mineral analysis and other fields. Considering that heavy metal elements in soil are microelements and have a wide variety of species, it is easy to overlapping element peak counts. It is necessary to design a new algorithm to accurately determine the heavy metal content due to the existing conventional methods based on statistical principles and lack of data verification.
Through the above analysis, the problems and defects existing in the prior art are: inaccurate measurement results from the existing instrument measurement method, peak overlapping interference between elements, great error, and inability to accurately predict the element content of analytes to be measured.
For the existing problem in the prior art, the present disclosure provides methods for XRF quantitative analysis of heavy metal elements based on LLE-SVR.
The present disclosure is realized by methods for XRF quantitative analysis of heavy metal elements based on LLE-SVR, comprising: e.g., establishing the relationship between peak information and element content by using Local Linear Embedding For Dimensionality Reduction and Support Vector Regression Predictive Algorithms based on machine learning, to quantitatively analyze the content information of elements contained in substances.
Further, the method for XRF quantitative analysis of heavy metal elements based on LLE-SVR may comprise the following procedures and/or steps of:
Further, the first step of obtaining the soil sample set and constructing the element set based on the soil sample set may comprise:
Further, the second step of normalizing the input matrix and the output matrix of the LLE-SVR model may comprise:
Further, the step of searching for neighbor points and calculating a weight value of the neighbor points based on the normalized matrixes of the input matrix and the output matrix, and performing LLE for dimensionality reduction and calculating a component matrix after LLE for dimensionality reduction may comprise:
Further, the fourth step of performing nonlinear function mapping and constructing a classification hyperplane, and introducing a penalty factor and a slack variable for constraint, and training the LLE-SVR model via parameter optimization to quantitatively predict the element content may comprise:
h
p[(w·φ(xp))+b]−1≥0;
wherein hp represent the class marker under p dimension, and when located above the hyperplane w·φ(xp)+b it is defined hp=1; when located below the hyperplane w·φ(xp)+b , it is defined hp=−1; p=1, 2, . . . , k, k≤m; w represents the weight value vector of the feature, and b represents the bias; xp represents the element component value vector after the dimensionality reduction of the sample to be measured into p dimensions via PCA, and φ(xp)represents a nonlinear mapping function mapping the data xp to the high-dimensional linear separable feature space;
Further, the fifth step of denormalizing the normalized output matrix of the target element content to obtain the denormalized output matrix, and calculating the coefficient of determination to evaluate the predicting effect of the LLE-SVR model, may comprise:
(2) comparing the predicted target element content ŷi1 with the true target element content yi1, and calculating the coefficient of determination R2:
Another object of the present disclosure is to provide a computer device comprising a storage storing a computer program, and a processor, and when the computer program is executed by the processor, the steps of the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR is performed by the processor.
Another object of the present disclosure is to provide a computer-readable storage medium storing a computer program, and when the computer program is executed by the processor, the steps of the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR is performed by the processor.
Another object of the present disclosure is to provide an information data processing terminal used to implement the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR.
Combined with the technical problems to be solved and the above technical solutions, the advantages and positive effects of the claimed technical solution in the present disclosure are as follows:
First, The local linear embedding (LLE) algorithm used in the present disclosure is based on sparse matrix feature decomposition, with relatively low computational complexity and being easy to implement, and the local features of the sample are maintained during dimensionality reduction.
The support vector regression (SVR) method used in the present disclosure is a novel method with a solid theoretical foundation, which is suitable for few-shot learning. It maps vectors into a higher-dimensional space where a maximum-margin hyperplane is established. Two parallel hyperplanes are built on both sides of the hyperplane that separates the data, and the separating hyperplane maximizes the distance between the two parallel hyperplanes, so that the element content can be predicted more accurately. The SVR method is insensitive to outliers, which can effectively grasp key samples and exhibits high robustness and excellent generalization ability.
In the present disclosure, the combined algorithm of the Locally Linear Embedding (LLE) and Support Vector Regression (SVR) is applied to the quantitative analysis of element contents. First, the complex redundant data is eliminated by LLE for dimensionality reduction, and then the SVR method is used for content prediction, the combination of which solves the problem of spectral line interference, such as overlapping peaks and escape peaks, and provides a new means for improving the content detection accuracy of heavy metal elements.
In the application and implementation, the content of element Pb in 57 nationally standard soil samples was predicted with high prediction accuracy, which provides an innovative method and technical support for the detection of heavy metal elements in soil.
Second, the quantitative analysis process of the present disclosure is simple, scientific and reasonable, with high prediction accuracy and intuitive results, and is easy to understand; the quantitative analysis method of the present disclosure has the characteristics of high detection precision and high prediction accuracy, and solves the problems of inaccuracy and peak overlapping interference between elements from traditional instrument measurement methods by establishing the relationship between element component value and element content, reducing the influence of environmental background, and realizing accurate prediction of the element content contained in the analytes.
Third, the present disclosure proposes an method of XRF quantitative analysis of heavy metal elements based on LLE-SVR, which improves the accuracy of element content prediction, i.e., it can more accurately analyze the geographical location of the pollution source, provide effective support for early treatment of soil environmental pollution, and directly reduce the cost of the soil treatment market by about 15%;
For a long time, people have never stopped research on the estimation of heavy metal elements, but the low accuracy of the system has always been a challenge in the industry, and the present disclosure well improves the accuracy and stability of the system by the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR.
In order to make the objectives, technical solution and advantages of the present disclosure clearer, the present disclosure will be further explained in detail with reference to the following examples. It should be understood that the specific examples described herein are only for explaining the present disclosure, but not for limiting the present disclosure.
As shown in
S101, a soil sample set was obtained and an element set was constructed based on said soil sample set; the ED-XRF fluorescence spectrometer was used to identify the peak information and content information of the elements in the sample to be measured in the soil sample set corresponding to that in the element set, to obtain the measured component value and content value of each element.
S102, an LLE-SVR model was constructed, and the input matrix and the output matrix of the LLE-SVR model was determined; the input matrix and the output matrix of the LLE-SVR model was normalized to obtain the normalized matrixes of the input matrix and the output matrix.
S103, the neighbor points were searched for based on the normalized matrixes of the input matrix and the output matrix and the weight values of the neighbor points were calculated; the LLE was performed for dimensionality reduction, and the component matrix after LLE for dimensionality reduction was calculated.
S104, the nonlinear function mapping was performed and the classification hyperplane was constructed; a penalty factor and a slack variable were introduced for constraint, and the LLE-SVR model was trained by parameter optimization for quantitative prediction of element content.
S105, the normalized output matrix of the target element content was denormalized to obtain a denormalized output matrix; the coefficient of determination was calculated to evaluate the prediction effect of the LLE-SVR model.
An example of the present disclosure provides an XRF quantitative analysis method of heavy elements based on LLE-SVR, comprising, for example, the following steps:
Step 1: a soil sample set was determined, and it is supposed that the soil sample set had n samples, which were sample 1, sample 2, . . . , sample 57, respectively. All elements which can be identified by the spectrometer was taken out to constitute an element set A contained in the soil sample, and finally, 57 element sets A1-A57 was obtained, i.e., the union set of A1-A57 is the element set A contained in the soil sample set, and the element set A is the element library of element Nos. 12-92 in periodic table.
Step 2: 57 nationally standard samples were adopted as the standard samples comprising three types of standard substances including GSS series, GBW series and GSD series. GSS Series Reference Material of Chemical Composition of Soil comprises GSS-1, GSS-2, GSS-3, GSS-4, GSS-5, GSS-6, GSS-7, GSS-8, GSS-9, GSS-10, GSS-11, GSS-12, GSS-13, GSS-14, GSS-15, GSS-16, GSS-17, GSS-18, GSS-19, GSS-20, GSS-21, GSS-22, GSS-23, GSS-24, GSS-25, GSS-26, GSS-27, GSS-32, and GBW Series Reference Material of Chemical Composition of Soil comprises GBW0070003, GBW0070004, GBW0070005, and GSD Series Reference Materials for the Chemical Composition of Stream Sediments comprises GSD-2a, GSD-3, GSD-5a, GSD-9, GSD-11, GSD-12, GSD-14, GSD-15, GSD-16, GSD-17, GSD-18, GSD-19, GSD-20, GSD-21, GSD-22, GSD-23, GSD-25, GSD-26, GSD-27, GSD-28, GSD-29, GSD-30, GSD-31, GSD-32, GSD-33. The XRF spectrogram of the samples and the component value X and the content value Y of the element contained in the sample can be obtained simultaneously by XRF Fluorescence Spectrometer, and the XRF spectrogram of the standard soil sample was shown in
Step 3: In the element set A, the union set of the studied target elements and their corresponding interfering elements was taken as an input variable of the LLE-SVR model, and the example mainly studied harmful elements of soil which were: 23 (V), 24 (Cr), 25 (Mn), 27 (Co), 29 (Cu), 30 (Zn), 48 (Cd), 82 (Pb), in a total of eight elements. 57 standard soil samples were taken as examples to record the component content of the target elements. The matrix of the measured component values composed of the target elements and their interfering elements was taken as the input of the LLE-SVR model, and the matrix of the target element Pb content was used as the output of the LLE-SVR model, and the details of the interfering elements are shown in Table 1.
Step 4: the XRF spectrum data was normalized: the input matrix Xnm and the output matrix Yn1 were normalized to obtain the normalized matrixes
Step 5: the neighbor points were searched for: in local neighborhood, the Euclidean Distance between each normalized sample point x′ij and the points of other n-1 samples were calculated, and I neighbor points of x′ij were selected;
Step 6: the weight value W of the neighbor point was calculated: the weight value W of each sample point x′ij and its I neighbor points, and the calculation formula for the weight value W was as follows
Step 7: the LLE was used for dimensionality reduction: according to the reconstruction weight value, all sample data points were mapped into the low-dimensional space to obtain the low-dimensional output Z, and the local linear features in the high-dimensional space were maintained as much as possible to minimize the reconstruction error function. The m-dimensional data of each row in the matrix Xnm was mapped to the k-dimensional data by nonlinear function mapping, i.e., obtaining k principal components after the local linear embedding method (LLE) for dimensionality reduction, and using the k-dimensional data to reflect the information expressed in the original m-dimensional data, and the dimensionality-reduced feature was:
Step 8: the k-dimensional element component value data was mapped from the low-dimensional nonlinear separable space to a high-dimensional linear separable feature space, and a classification hyperplane was constructed in this high-dimensional linear separable feature space:
hp[(w·φ(xp))+b]−1≥0 (10)
Step 9: a penalty factor ξp and a slack variable were introduced for constraint and the classification hyperplane problem was converted into a quadratic programming model:
Step 10: the LLE-SVR model was trained by searching for parameter optimization using the cross-validation method based on grid search. The optimal parameter penalty factor C and the optimal slack variable ξp were obtained by iteratively searching for the optimal parameters. The Lagrangian multiplier αp and the kernel function K was introduced to solve the formula (11), and the minimum classification hyperplane satisfying the required precision was the prediction result ŷ′i, of the target element content, and the calculation formula for predicting the target element content of any ith sample to be measured was:
Step 11: the XRF spectrum data was denormalized; the normalized output matrix Ŷ′n1 of the target element content was denormalized to obtain the denormalized output matrix Ŷ′n1. The process of denormalizing the matrix was as follows:
Step 12: the predicted target element content ŷi1 was compared with the true target element content yi1, and the coefficient of determination (R2) was calculated. The calculation formula for R2 was as follows, respectively:
The method of XRF quantitative analysis of heavy metal elements based on LLE-SVR provided by the example of the present disclosure was applied with a computer device, which comprised a storage storing a computer program, and a processor, and when the computer program was executed by the processor, the processor performed the steps of the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR.
The method of XRF quantitative analysis of heavy metal elements based on LLE-SVR provided by the example of the present disclosure was applied with a computer-readable storage medium storing a computer program, and when the computer program was executed by the processor, the processor performed the steps of the method of XRF quantitative analysis of heavy metal elements based on LLE-SVR.
The method of quantitative analysis of XRF heavy metal elements based on LLE-SVR provided by the example of the present disclosure was applied with an information data processing terminal, which is used to realize the method of quantitative analysis of XRF heavy metal elements based on LLE-SVR.
Taking the heavy metal element Co as an example, comparing the resulted coefficients of determination R 2 of the standard soil sample element between the traditional partial least squares regression (PLSR) method and the method based on the LLE-SVR, and the content prediction results were shown in
It should be noted that, the examples of the present disclosure may be implemented by hardware, software, or a combination of software and hardware. The part of hardware can be implemented using specialized logic; the part of software can be stored in memory and executed by an appropriate instruction execution system, such as a microprocessor or a specialized design hardware. Those skilled in the art may understand that the above devices and methods may be implemented using computer executable instructions and/or contained in processor control code, for example providing such code on a carrier medium, such as a disk, CD or DVD-ROM, a programmable memory such as a read-only memory (firmware), or a data carrier such as an optical or an electronic signal carrier. The devices and modules thereof of the present disclosure may be implemented by a hardware circuit, such as a very large scale integrated circuit or a gate array, such as a semiconductor, for example, a logic chip, a transistor, or a programmable hardware device such as a field programmable gate array, programmable logic device, etc., or may be implemented with software executed by various types of processors, or may be implemented by a combination of the above hardware circuits and software, such as firmware.
The above merely describes the specific embodiments of the present disclosure, which is not intended to limit the scope of protection of the present disclosure. Any modifications, equivalent substitutions and improvements made within the spirit and principle of the present disclosure by those skilled in the art according to the disclosed technical scope should be included in the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202210881518.5 | Jul 2022 | CN | national |