The present invention belongs to the technical field of hyperspectral nondestructive testing. Specifically, it relates to a semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network.
A hyperspectral-based quantitative analysis technology has a wide range of application scenarios, including food adulteration detection, fruit sugar content detection, microbiological content detection, organic matter content detection and the like. Commonly used hyperspectral quantitative analysis algorithms include Partial Least Squares Regression (PLSR), Least Squares Support Vector Machine (LS-SVM), Multiple Linear Regression (MLR), and other methods. However, the model accuracy and model robustness of the quantitative analysis algorithms are still needed to be further improved. Especially in the actual application scenarios, there are unbalanced label samples, or there are few label samples and the like, the difficulty of modeling is further increased.
At present, although a convolutional neural network is already proven to have a powerful ability to analyze complex information, and the convolutional network is well applied in a classification application of remote sensing hyperspectral data. However, the application thereof in the quantitative analysis of the hyperspectral data still has greater difficulty. The main reason is that in practical applications, a hyperspectral data sample, especially a calibrated sample, is difficult to obtain, and a small sample size brings a very large risk of overfitting.
Xidian University, in invention patent documents “hyperspectral image classification method based on multi-class generative adversarial network” (application number: 201810648520.4) and “hyperspectral image classification method united with collaborative generative adversarial network and space spectrum” (application number: 201810977887.8), uses the Generative Adversarial Network (GAN) to generate a spectral curve sample, thereby an overfitting phenomenon is alleviated, and the classification accuracy of the hyperspectral data is improved. However, the above document methods are all developed for a problem of the hyperspectral data classification, and it is still difficult to use the generative adversarial network to perform the quantitative analysis of the hyperspectral data. Firstly, unlike a discrete label of the classification problem, a label value in the quantitative analysis is a continuous analog quantity, so a network structure design may not be equal to the semi-supervised classification problem, and it is impossible to use a k+1 class or k classification adversarial network to achieve regression; and secondly, in the classification problem, generated data is mainly used to assist determine a boundary value of the classification, and in the regression problem, the generated data needs to be used to smooth sample quantitative value distribution. Therefore, it is necessary to design a new network structure and a loss function to achieve the semi-supervised hyperspectral data quantitative analysis based on the generative adversarial network. Thereby the accuracy of the analysis is improved.
In view of the above disadvantages of the existing technology, the present invention provides a semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network. The method is capable of, by a generative adversarial network, generating a spectrum sample, and using the generated sample to achieve a purpose of enhancing the continuity of sample distribution and suppressing overfitting, thereby improving the accuracy of quantitative analysis of hyperspectral data.
The present invention solves the above problems through the following technical means.
A semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network, which constructs the generative adversarial network for quantitative analysis, herein a generator is used to generate a sample, and a discriminator is used to distinguish the authenticity of the samples, and output a quantitative analysis result at the same time. The method includes the following steps.
S1. Labeled hyperspectral sample data and unlabeled hyperspectral sample data are acquired.
S2. A sample training set and a prediction set are constructed:
S2-1. The labeled sample is used as a training set sample, and the unlabeled sample is used as a prediction set sample, and also used for semi-supervised training; and
S2-2. A mean value of a spectral curve of n effective pixels is randomly taken by m times in a hyperspectral sample data block of each training set and prediction set, as sample enhancement, and obtained labeled and unlabeled average spectral data sets are marked as Dlabel and Dunlabel.
S3. A regression network based on the generative adversarial network is constructed:
S3-1. A generator network is constructed, and the network sequentially consists of: fully connected layer-upsampling layer-convolutional layer-upsampling layer-convolutional layer-output layer, herein the number of nodes in the fully connected layer is 16*the number of spectral wavebands, the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the upsampling layer is 2 times of upsampling, the number of nodes in the output layer is the same as the number of the spectral wavebands, except for the output layer, a nonlinear excitation function is ReLU, and output layer nonlinear excitation is a sigmoid function.
S3-2. A discriminator/regressor network is constructed, and the network sequentially consists of: convolutional layer-pooling layer-convolutional layer-pooling layer-convolutional layer-pooling layer-output layer, herein the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the pooling layer is ½ downsampling, there are 2 output layers, one of which outputs a result of the discriminator, namely the spectral data authenticity prediction value, and the other outputs a result of the regressor, namely the quantitative analysis prediction value, except for the output layer, a nonlinear excitation function is leakyReLU, and nonlinear excitation of the two output layers is a sigmoid function.
S4. A loss function of the generative adversarial regression network is constructed:
S4-1. the loss function of the discriminator/regressor is the sum of loss functions of a labeled sample and an unlabeled sample, herein the loss function of the unlabeled sample is the sum of a loss function of a real unlabeled sample (Lunlabel_real) and a loss function of a generated sample (Lunlabel_fake):
L
D
=L
supervised
+L
unsupervised
L
unsupervised
=L
unlabel_real
+L
unlabel_fake
Herein the label sample loss function is a regression loss function, namely a mean square error of the quantitative analysis prediction value of the labeled sample and the quantitative label value:
L
supervised
=∥z
label
−{circumflex over (z)}
label∥2
The unlabeled sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:
The generated sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:
L
unlabel_fake=−0.5*Σiyfakei log(ŷfakei)
S4-2. A loss function of the generator is the sum of the loss function of the generated sample and a sample distribution matching loss function:
L
G
=L′
unlabel_fake
+L
distribution
Herein, the loss function of the generated sample (L′unlabel_fake) is the cross entropy of the prediction value of the generator generated sample by the discriminator and the authenticity label, and the generator is opposite to the authenticity label of the generated sample in the discriminator, so that the generator and the discriminator are opposed:
The sample distribution matching loss function (Ldistribution) is a mean square error of the unlabeled sample and generated sample quantitative analysis value distribution predicted by the regressor:
L
distribution
=∥{circumflex over (p)}
unlabel(z)−{circumflex over (p)}fake(z)∥2
Herein {circumflex over (p)}unlabel(z) is the quantitative analysis value distribution of the unlabeled samples predicted by the regressor, and {circumflex over (p)}fake(z) is the quantitative analysis value distribution of the generated samples predicted by the regressor. Calculating steps of {circumflex over (p)}unlabel(z) and {circumflex over (p)}fake(z) are as follows:
S4-2-1. The quantitative analysis values of the unlabeled samples and the generated samples of an existing training batch are predicted by the regressor, which are {circumflex over (z)}unlabeli and {circumflex over (z)}fakei, respectively.
S4-2-2. The distribution of the sample quantitative analysis values is approximated to multinomial distribution of k items, {circumflex over (z)}unlabeli and {circumflex over (z)}fakei are approximated to k levels, and then probability distribution {circumflex over (p)}unlabel(z) and {circumflex over (p)}fake(z) is obtained by counting.
The sample distribution loss function exists to reduce the similarity of the quantitative analysis value distributions of the existing sample and the generated sample, so that the generated sample becomes a supplement to the existing unlabeled sample.
S5. The generative adversarial regression network is trained.
A gradient descent method is adopted to alternately train the discriminator/regressor and the generator, until a root mean square error of a quantitative analysis prediction value of a training set sample and a label value is converged to be less than a threshold or the number of training steps is greater than a threshold.
S6. The regressor in the trained generative adversarial network is used to obtain the quantitative analysis prediction value of the prediction set.
Compared with an existing technology, the present invention has the following advantages.
The present invention uses the generative adversarial network to generate samples, uses a sample distribution matching strategy to supplement an existing unlabeled sample set, and uses the generated sample matched with the distribution to achieve a similar regularization effect to the regressor, thereby a problem that deep learning easily produces the overfitting in the quantitative analysis is overcome, and the accuracy of the hyperspectral quantitative analysis is improved.
In order to explain technical schemes in embodiments of the present invention more clearly, drawings used in descriptions of the embodiments are briefly introduced below. It is apparent that the drawings in the following descriptions are only some embodiments of the present invention, and other drawings may be acquired by those of ordinary skill in the art without creative work according to these drawings.
In order to make the above purposes, features, and advantages of the present invention more apparent and understandable, technical schemes of the present invention are described in detail below with reference to drawings and specific embodiments. It should be pointed out that the described embodiments are only a part of the embodiments of the present invention rather than all of the embodiments. Based on the embodiments of the present invention, all of the other embodiments obtained by those of ordinary skill in the art without creative work shall fall within a scope of protection of the present invention.
This embodiment uses hyperspectral measurement to determine the active ingredients of a paracetamol tablet. As shown in
S1. Hyperspectral image data of a total of 437 paracetamol tablets is measured, herein the content of the active ingredient (paracetamol) is 0%˜90% w/w, the interval is 5% w/w, there are 19 levels in total, and 23 samples per level. A hyperspectral waveband is 900-1700 nm, herein 100 nm before and after is removed due to a high noise, and a total of 180 wavebands from 1000-1600 nm are used for data analysis.
S2. A sample training set and a prediction set are constructed:
S2-1. 10 samples are randomly taken as the training set, namely 10 labeled samples, and the remaining 427 samples are used as the prediction set to evaluate the accuracy of a model, and also used as unlabeled samples for semi-supervised training.
S2-2. An average value of spectral curves of 100 effective pixels is randomly taken by 10 times in a hyperspectral sample data block of each training set and prediction set, as sample enhancement, and obtained labeled and unlabeled average spectral data sets are marked as Dlabel and Dunlabel.
S3. A regression network based on the generative adversarial network is constructed:
S3-1. A generator network is constructed, and the network sequentially consists of: fully connected layer-upsampling layer-convolutional layer-upsampling layer-convolutional layer-output layer, herein the number of nodes in the fully connected layer is 64*45, the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, the number of the convolution kernels is 64, the upsampling layer is 2 times of upsampling, the number of nodes in the output layer is 180, except for the output layer, a nonlinear excitation function is ReLU, and output layer nonlinear excitation is a sigmoid function.
S3-2. A discriminator/regressor network is constructed, and the network sequentially consists of: convolutional layer-pooling layer-convolutional layer-pooling layer-convolutional layer-pooling layer-output layer, herein the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, the number of the convolution kernels is 16, the pooling layer is ½ downsampling, there are 2 output layers, one of which outputs a result of the discriminator, namely the spectral data authenticity label, and the other outputs a result of the regressor, namely the effective ingredient content value, except for the output layer, a nonlinear excitation function is leakyReLU, and nonlinear excitation of the two output layers is a sigmoid function.
S4. A loss function of the generative adversarial regression network is constructed.
S4-1. The loss function of the discriminator/regressor is the sum of loss functions of a labeled sample and an unlabeled sample, herein the loss function of the unlabeled sample is the sum of a loss function of a real unlabeled sample and a loss function of a generated sample:
L
D
=L
supervised
+L
unsupervised
L
unsupervised
=L
unlabel_real
+L
unlabel_fake
Herein the label sample loss function is a mean square error of the effective ingredient prediction value of the labeled sample and the effective ingredient label value:
L
supervised
=∥z
label
−z
label∥2
The unlabeled sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:
The generated sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:
L
unlabel_fake=−0.5*Σiyfakei log(ŷfakei)
S4-2. A loss function of the generator is the sum of the loss function of the generated sample and a sample distribution matching loss function:
L
G
=L′
unlabel_fake
+L
distribution
Herein, the loss function of the generated sample (L′unlabel_fake) is the cross entropy of the prediction value of the generator generated sample by the discriminator and the authenticity label, herein the discriminator is opposite to the generated sample authenticity label of the generator, thus a confrontation is formed:
The sample distribution matching loss function (Ldistribution) is a mean square error of the unlabeled sample and generated sample quantitative analysis value distribution predicted by the regressor:
L
distribution
=∥{circumflex over (z)}
unlabel(z){circumflex over (p)}fake(z)∥2
Herein {circumflex over (p)}unlabel(z) is the quantitative analysis value distribution of the unlabeled samples predicted by the regressor, and {circumflex over (p)}fake(z) is the quantitative analysis value distribution of the generated samples predicted by the regressor. A specific calculating method of {circumflex over (p)}unlabel(z) and {circumflex over (p)}fake(z) is as follows: the effective ingredient value of the sample is approximated to a 100-item multinomial distribution with a value of 0 to 1, and the unlabeled sample and generated sample quantitative analysis values {circumflex over (z)}unlabel and {circumflex over (z)}fakei of an existing training batch predicted by the regressor are approximated to the above 100 levels, and {circumflex over (p)}unlabel(z) and {circumflex over (p)}fake(z) is obtained by sample counting statistics.
S5. The generative adversarial regression network is trained.
A gradient descent method is adopted to alternately train the discriminator/regressor and the generator, an optimizer adopts an “Adam” optimizer, the learning rate is 0.0005, and it is stopped while the number of training steps reaches 1000.
S6. The regressor in the trained generative adversarial network is used to obtain the tablet effective ingredient prediction value of the prediction set.
The training set and the prediction set are randomly sampled by 10 times for calculation, and the PLSR and CNN are used as a comparison method for calculation. In the PLSR method, the number of principal ingredients is determined by ten-fold cross-validation, and parameters of the CNN are consistent with the regressor in the method of the present invention. Results of 10 times of the calculation are as follows: a Root Mean Square Error of Calibration (RMSEC) of the training set obtained by the partial least square method is 1.01±0.46%, and a Root Mean Square Error of Prediction (RMSEP) of the prediction set is 3.79±1.06%; an RMSEC of the training set obtained by the CNN is 1.45±0.71%, and an RMSEP of the prediction set is 5.84±1.77%, and an RMSEC of the training set obtained by the method of the present invention is 2.42±0.49%, and an RMSEP of the prediction set is 2.56±0.88%. A scatter diagram of a real value and a prediction value calculated by randomly sampling once is shown in
It may be seen from the calculation results that in the case of small samples, both the PLSR and CNN have apparent overfitting phenomena, and the accuracy of the prediction set is much lower than the accuracy of the training set. Since the method of the present invention adopts the generative adversarial-type network structure, the overfitting problem is well alleviated, and the accuracy of the prediction set is significantly improved.
The above embodiments only express several implementation modes of the present invention, and descriptions thereof are relatively specific and detailed, but it should not be understood as a limitation to a patent scope of the present invention. It should be pointed out that for those of ordinary skill in the art, a plurality of modifications and improvements may be made without departing from the concept of the present invention, and these all fall within a scope of protection of the present invention. Therefore, the scope of protection of the patent of the present invention should be subject to the appended claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 201910420079.6 | May 2019 | CN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2020/079710 | 3/17/2020 | WO | 00 |