Semi-supervised Hyperspectral Data Quantitative Analysis Method Based on Generative Adversarial Network

Description

TECHNICAL FIELD

The present invention belongs to the technical field of hyperspectral nondestructive testing. Specifically, it relates to a semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network.

BACKGROUND

A hyperspectral-based quantitative analysis technology has a wide range of application scenarios, including food adulteration detection, fruit sugar content detection, microbiological content detection, organic matter content detection and the like. Commonly used hyperspectral quantitative analysis algorithms include Partial Least Squares Regression (PLSR), Least Squares Support Vector Machine (LS-SVM), Multiple Linear Regression (MLR), and other methods. However, the model accuracy and model robustness of the quantitative analysis algorithms are still needed to be further improved. Especially in the actual application scenarios, there are unbalanced label samples, or there are few label samples and the like, the difficulty of modeling is further increased.

At present, although a convolutional neural network is already proven to have a powerful ability to analyze complex information, and the convolutional network is well applied in a classification application of remote sensing hyperspectral data. However, the application thereof in the quantitative analysis of the hyperspectral data still has greater difficulty. The main reason is that in practical applications, a hyperspectral data sample, especially a calibrated sample, is difficult to obtain, and a small sample size brings a very large risk of overfitting.

Xidian University, in invention patent documents “hyperspectral image classification method based on multi-class generative adversarial network” (application number: 201810648520.4) and “hyperspectral image classification method united with collaborative generative adversarial network and space spectrum” (application number: 201810977887.8), uses the Generative Adversarial Network (GAN) to generate a spectral curve sample, thereby an overfitting phenomenon is alleviated, and the classification accuracy of the hyperspectral data is improved. However, the above document methods are all developed for a problem of the hyperspectral data classification, and it is still difficult to use the generative adversarial network to perform the quantitative analysis of the hyperspectral data. Firstly, unlike a discrete label of the classification problem, a label value in the quantitative analysis is a continuous analog quantity, so a network structure design may not be equal to the semi-supervised classification problem, and it is impossible to use a k+1 class or k classification adversarial network to achieve regression; and secondly, in the classification problem, generated data is mainly used to assist determine a boundary value of the classification, and in the regression problem, the generated data needs to be used to smooth sample quantitative value distribution. Therefore, it is necessary to design a new network structure and a loss function to achieve the semi-supervised hyperspectral data quantitative analysis based on the generative adversarial network. Thereby the accuracy of the analysis is improved.

SUMMARY

In view of the above disadvantages of the existing technology, the present invention provides a semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network. The method is capable of, by a generative adversarial network, generating a spectrum sample, and using the generated sample to achieve a purpose of enhancing the continuity of sample distribution and suppressing overfitting, thereby improving the accuracy of quantitative analysis of hyperspectral data.

The present invention solves the above problems through the following technical means.

A semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network, which constructs the generative adversarial network for quantitative analysis, herein a generator is used to generate a sample, and a discriminator is used to distinguish the authenticity of the samples, and output a quantitative analysis result at the same time. The method includes the following steps.

S1. Labeled hyperspectral sample data and unlabeled hyperspectral sample data are acquired.

S2. A sample training set and a prediction set are constructed:

S2-1. The labeled sample is used as a training set sample, and the unlabeled sample is used as a prediction set sample, and also used for semi-supervised training; and

S2-2. A mean value of a spectral curve of n effective pixels is randomly taken by m times in a hyperspectral sample data block of each training set and prediction set, as sample enhancement, and obtained labeled and unlabeled average spectral data sets are marked as D_labeland D_unlabel.

S3. A regression network based on the generative adversarial network is constructed:

S3-1. A generator network is constructed, and the network sequentially consists of: fully connected layer-upsampling layer-convolutional layer-upsampling layer-convolutional layer-output layer, herein the number of nodes in the fully connected layer is 16*the number of spectral wavebands, the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the upsampling layer is 2 times of upsampling, the number of nodes in the output layer is the same as the number of the spectral wavebands, except for the output layer, a nonlinear excitation function is ReLU, and output layer nonlinear excitation is a sigmoid function.

S3-2. A discriminator/regressor network is constructed, and the network sequentially consists of: convolutional layer-pooling layer-convolutional layer-pooling layer-convolutional layer-pooling layer-output layer, herein the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the pooling layer is ½ downsampling, there are 2 output layers, one of which outputs a result of the discriminator, namely the spectral data authenticity prediction value, and the other outputs a result of the regressor, namely the quantitative analysis prediction value, except for the output layer, a nonlinear excitation function is leakyReLU, and nonlinear excitation of the two output layers is a sigmoid function.

S4. A loss function of the generative adversarial regression network is constructed:

S4-1. the loss function of the discriminator/regressor is the sum of loss functions of a labeled sample and an unlabeled sample, herein the loss function of the unlabeled sample is the sum of a loss function of a real unlabeled sample (L_{unlabel_real}) and a loss function of a generated sample (L_{unlabel_fake}):

L
_D
=L
_supervised
+L
_unsupervised

L
_unsupervised
=L
_{unlabel_real}
+L
_{unlabel_fake}

Herein the label sample loss function is a regression loss function, namely a mean square error of the quantitative analysis prediction value of the labeled sample and the quantitative label value:

L
_supervised
=∥z
_label
−{circumflex over (z)}
_label∥₂

The unlabeled sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:

$L_{unlabel_real} = - 0.5 * \sum_{i}^{} y_{unlabel}^{i} \log ({\hat{y}}_{unlabel}^{i})$

The generated sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:

L
_{unlabel_fake}=−0.5*Σ_iy_fakeⁱlog(ŷ_fakeⁱ)

S4-2. A loss function of the generator is the sum of the loss function of the generated sample and a sample distribution matching loss function:

L
_G
=L′
_{unlabel_fake}
+L
_distribution

Herein, the loss function of the generated sample (L′_{unlabel_fake}) is the cross entropy of the prediction value of the generator generated sample by the discriminator and the authenticity label, and the generator is opposite to the authenticity label of the generated sample in the discriminator, so that the generator and the discriminator are opposed:

$L_{unlabel_fake}^{'} = - \sum_{i}^{} {\tilde{y}}_{fake}^{i} \log ({\tilde{y}}_{fake}^{i})$

The sample distribution matching loss function (L_distribution) is a mean square error of the unlabeled sample and generated sample quantitative analysis value distribution predicted by the regressor:

L
_distribution
=∥{circumflex over (p)}
_unlabel(z)−{circumflex over (p)}_fake(z)∥₂

S4-2-1. The quantitative analysis values of the unlabeled samples and the generated samples of an existing training batch are predicted by the regressor, which are {circumflex over (z)}_unlabelⁱand {circumflex over (z)}_fakeⁱ, respectively.

S4-2-2. The distribution of the sample quantitative analysis values is approximated to multinomial distribution of k items, {circumflex over (z)}_unlabelⁱand {circumflex over (z)}_fakeⁱare approximated to k levels, and then probability distribution {circumflex over (p)}_unlabel(z) and {circumflex over (p)}_fake(z) is obtained by counting.

The sample distribution loss function exists to reduce the similarity of the quantitative analysis value distributions of the existing sample and the generated sample, so that the generated sample becomes a supplement to the existing unlabeled sample.

S5. The generative adversarial regression network is trained.

A gradient descent method is adopted to alternately train the discriminator/regressor and the generator, until a root mean square error of a quantitative analysis prediction value of a training set sample and a label value is converged to be less than a threshold or the number of training steps is greater than a threshold.

S6. The regressor in the trained generative adversarial network is used to obtain the quantitative analysis prediction value of the prediction set.

Compared with an existing technology, the present invention has the following advantages.

The present invention uses the generative adversarial network to generate samples, uses a sample distribution matching strategy to supplement an existing unlabeled sample set, and uses the generated sample matched with the distribution to achieve a similar regularization effect to the regressor, thereby a problem that deep learning easily produces the overfitting in the quantitative analysis is overcome, and the accuracy of the hyperspectral quantitative analysis is improved.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain technical schemes in embodiments of the present invention more clearly, drawings used in descriptions of the embodiments are briefly introduced below. It is apparent that the drawings in the following descriptions are only some embodiments of the present invention, and other drawings may be acquired by those of ordinary skill in the art without creative work according to these drawings.

FIG. 1 is a flow diagram of the present invention.

FIG. 2 (a) is a scatter diagram of a real value and a prediction value of a tablet effective ingredient obtained by using a Partial Least Squares Regression (PLSR).

FIG. 2 (b) is a scatter diagram of a real value and a prediction value of a tablet effective ingredient obtained by directly using a Convolutional Network (CNN) without using a generative adversarial network.

FIG. 2 (c) is a scatter diagram of a real value and a prediction value of a tablet effective ingredient obtained by using the method of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In order to make the above purposes, features, and advantages of the present invention more apparent and understandable, technical schemes of the present invention are described in detail below with reference to drawings and specific embodiments. It should be pointed out that the described embodiments are only a part of the embodiments of the present invention rather than all of the embodiments. Based on the embodiments of the present invention, all of the other embodiments obtained by those of ordinary skill in the art without creative work shall fall within a scope of protection of the present invention.

This embodiment uses hyperspectral measurement to determine the active ingredients of a paracetamol tablet. As shown in FIG. 1, a semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network includes the following specific steps.

S1. Hyperspectral image data of a total of 437 paracetamol tablets is measured, herein the content of the active ingredient (paracetamol) is 0%˜90% w/w, the interval is 5% w/w, there are 19 levels in total, and 23 samples per level. A hyperspectral waveband is 900-1700 nm, herein 100 nm before and after is removed due to a high noise, and a total of 180 wavebands from 1000-1600 nm are used for data analysis.

S2. A sample training set and a prediction set are constructed:

S2-1. 10 samples are randomly taken as the training set, namely 10 labeled samples, and the remaining 427 samples are used as the prediction set to evaluate the accuracy of a model, and also used as unlabeled samples for semi-supervised training.

S2-2. An average value of spectral curves of 100 effective pixels is randomly taken by 10 times in a hyperspectral sample data block of each training set and prediction set, as sample enhancement, and obtained labeled and unlabeled average spectral data sets are marked as D_labeland D_unlabel.

S3. A regression network based on the generative adversarial network is constructed:

S3-1. A generator network is constructed, and the network sequentially consists of: fully connected layer-upsampling layer-convolutional layer-upsampling layer-convolutional layer-output layer, herein the number of nodes in the fully connected layer is 64*45, the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, the number of the convolution kernels is 64, the upsampling layer is 2 times of upsampling, the number of nodes in the output layer is 180, except for the output layer, a nonlinear excitation function is ReLU, and output layer nonlinear excitation is a sigmoid function.

S3-2. A discriminator/regressor network is constructed, and the network sequentially consists of: convolutional layer-pooling layer-convolutional layer-pooling layer-convolutional layer-pooling layer-output layer, herein the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, the number of the convolution kernels is 16, the pooling layer is ½ downsampling, there are 2 output layers, one of which outputs a result of the discriminator, namely the spectral data authenticity label, and the other outputs a result of the regressor, namely the effective ingredient content value, except for the output layer, a nonlinear excitation function is leakyReLU, and nonlinear excitation of the two output layers is a sigmoid function.

S4. A loss function of the generative adversarial regression network is constructed.

S4-1. The loss function of the discriminator/regressor is the sum of loss functions of a labeled sample and an unlabeled sample, herein the loss function of the unlabeled sample is the sum of a loss function of a real unlabeled sample and a loss function of a generated sample:

L
_D
=L
_supervised
+L
_unsupervised

L
_unsupervised
=L
_{unlabel_real}
+L
_{unlabel_fake}

Herein the label sample loss function is a mean square error of the effective ingredient prediction value of the labeled sample and the effective ingredient label value:

L
_supervised
=∥z
_label
−z
_label∥₂

The unlabeled sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:

$L_{unlabel_real} = - 0.5 * \sum_{i}^{} y_{unlabel}^{i} \log ({\hat{y}}_{unlabel}^{i})$

The generated sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:

L
_{unlabel_fake}=−0.5*Σ_iy_fakeⁱlog(ŷ_fakeⁱ)

S4-2. A loss function of the generator is the sum of the loss function of the generated sample and a sample distribution matching loss function:

L
_G
=L′
_{unlabel_fake}
+L
_distribution

Herein, the loss function of the generated sample (L′_{unlabel_fake}) is the cross entropy of the prediction value of the generator generated sample by the discriminator and the authenticity label, herein the discriminator is opposite to the generated sample authenticity label of the generator, thus a confrontation is formed:

$L_{unlabel_fake}^{'} = - \sum_{i}^{} {\tilde{y}}_{fake}^{i} \log ({\tilde{y}}_{fake}^{i})$

L
_distribution
=∥{circumflex over (z)}
_unlabel(z){circumflex over (p)}_fake(z)∥₂

Herein {circumflex over (p)}_unlabel(z) is the quantitative analysis value distribution of the unlabeled samples predicted by the regressor, and {circumflex over (p)}_fake(z) is the quantitative analysis value distribution of the generated samples predicted by the regressor. A specific calculating method of {circumflex over (p)}_unlabel(z) and {circumflex over (p)}_fake(z) is as follows: the effective ingredient value of the sample is approximated to a 100-item multinomial distribution with a value of 0 to 1, and the unlabeled sample and generated sample quantitative analysis values {circumflex over (z)}_unlabeland {circumflex over (z)}_fakeⁱof an existing training batch predicted by the regressor are approximated to the above 100 levels, and {circumflex over (p)}_unlabel(z) and {circumflex over (p)}_fake(z) is obtained by sample counting statistics.

S5. The generative adversarial regression network is trained.

A gradient descent method is adopted to alternately train the discriminator/regressor and the generator, an optimizer adopts an “Adam” optimizer, the learning rate is 0.0005, and it is stopped while the number of training steps reaches 1000.

S6. The regressor in the trained generative adversarial network is used to obtain the tablet effective ingredient prediction value of the prediction set.

The training set and the prediction set are randomly sampled by 10 times for calculation, and the PLSR and CNN are used as a comparison method for calculation. In the PLSR method, the number of principal ingredients is determined by ten-fold cross-validation, and parameters of the CNN are consistent with the regressor in the method of the present invention. Results of 10 times of the calculation are as follows: a Root Mean Square Error of Calibration (RMSEC) of the training set obtained by the partial least square method is 1.01±0.46%, and a Root Mean Square Error of Prediction (RMSEP) of the prediction set is 3.79±1.06%; an RMSEC of the training set obtained by the CNN is 1.45±0.71%, and an RMSEP of the prediction set is 5.84±1.77%, and an RMSEC of the training set obtained by the method of the present invention is 2.42±0.49%, and an RMSEP of the prediction set is 2.56±0.88%. A scatter diagram of a real value and a prediction value calculated by randomly sampling once is shown in FIG. 2a-2c.

It may be seen from the calculation results that in the case of small samples, both the PLSR and CNN have apparent overfitting phenomena, and the accuracy of the prediction set is much lower than the accuracy of the training set. Since the method of the present invention adopts the generative adversarial-type network structure, the overfitting problem is well alleviated, and the accuracy of the prediction set is significantly improved.

The above embodiments only express several implementation modes of the present invention, and descriptions thereof are relatively specific and detailed, but it should not be understood as a limitation to a patent scope of the present invention. It should be pointed out that for those of ordinary skill in the art, a plurality of modifications and improvements may be made without departing from the concept of the present invention, and these all fall within a scope of protection of the present invention. Therefore, the scope of protection of the patent of the present invention should be subject to the appended claims.

Claims

1. A semi-supervised hyperspectral data quantitative analysis method based on a generative adversarial network, wherein it comprises the following steps: S1. acquiring labeled hyperspectral sample data and unlabeled hyperspectral sample data;S2. constructing a sample training set and a prediction set: a labeled hyperspectral sample data set Dlabel is used as the training set, and an unlabeled hyperspectral sample data set Dunlabel is used as the prediction set;S3. constructing a regression network based on the generative adversarial network:S3-1. constructing a generator network consisting of a fully connected layer, a convolutional layer, an upsampling layer, and an output layer;S3-2. constructing a discriminator/regressor network consisting of a convolutional layer, a pooling layer, and an output layer, wherein the network has two outputs of judging data authenticity and a quantitative analysis value;S4. constructing a loss function of the generative adversarial regression network:S4-1. the loss function of the discriminator/regressor is the sum of loss functions of a labeled sample and an unlabeled sample, wherein the loss function of the unlabeled sample is the sum of a loss function of a real unlabeled sample (Lunlabel_real) and a loss function of a generated sample (Lunlabel_fake): LD=Lsupervised+Lunsupervised Lunsupervised=Lunlabel_real+Lunlabel_fake S4-2. a loss function of the generator is the sum of the loss function of the generated sample (L′unlabel_fake) and a sample distribution matching loss function (Ldistribution): LG=L′unlabel_fake+Ldistribution wherein, the sample distribution matching loss function is to make the generated sample and the real sample complement each other in a sample space, as to achieve the effect of suppressing overfitting;S5. training the regression network based on the generative adversarial network:adopting a gradient descent method to alternately train the discriminator/regressor and the generator, until a root mean square error of a quantitative analysis prediction value of a training set sample and a label value is converged to be less than a threshold or the number of training steps is greater than a threshold; andS6. using the regressor in the trained generative adversarial network to obtain the quantitative analysis prediction value of the prediction set.
2. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 1, wherein the step of constructing a sample training set and a prediction set in S2 comprises: S2-1. using the labeled sample as a training set sample, and using the unlabeled sample as a prediction set sample, which is also used for semi-supervised training; andS2-2. randomly taking a mean value of a spectral curve of n effective pixels by m times in a hyperspectral sample data block of each training set and prediction set, as sample enhancement, and obtained labeled and unlabeled average spectral data sets are marked as Dlabel and Dunlabel.
3. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 1, wherein the generator network in S3-1 sequentially consists of: fully connected layer-upsampling layer-convolutional layer-upsampling layer-convolutional layer-output layer, wherein the number of nodes in the fully connected layer is 16*the number of spectral wavebands, the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the upsampling layer is 2 times of upsampling, the number of nodes in the output layer is the same as the number of the spectral wavebands, except for the output layer, a nonlinear excitation function is ReLU, and output layer nonlinear excitation is a sigmoid function.
4. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 1, wherein the discriminator/regressor network in S3-2 sequentially consists of: convolutional layer-pooling layer-convolutional layer-pooling layer-convolutional layer-pooling layer-output layer, wherein the convolutional layer is one-dimensional convolution, the size of a convolution kernel is 1×5, a value range of the number of the convolution kernels is 16˜128, the pooling layer is ½ downsampling, there are 2 output layers, one of which outputs a result of the discriminator, namely the spectral data authenticity prediction value, and the other outputs a result of the regressor, namely the quantitative analysis prediction value, except for the output layer, a nonlinear excitation function is leakyReLU, and nonlinear excitation of the two output layers is a sigmoid function.
5. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 1, wherein the loss function of the discriminator/regressor in S4-1 is the sum of three parts of the labeled sample loss function, the unlabeled sample loss function and the generated sample loss function, wherein the label sample loss function is a mean square error of the quantitative analysis prediction value of the labeled sample and the quantitative label value: Lsupervised=∥zlabel−zlabel∥2 the unlabeled sample loss function is a cross entropy of a discriminator prediction value and an authenticity label:
6. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 1, wherein the generator loss function in S4-2 is the sum of the loss function of the generated sample and the sample distribution matching loss function, wherein the loss function of the generated sample is the cross entropy of the prediction value of the generator generated sample by the discriminator and the authenticity label; and the generator is opposite to the authenticity label of the generated sample in the discriminator, so that the generator and the discriminator are opposed:
7. The semi-supervised hyperspectral data quantitative analysis method based on the generative adversarial network as claimed in claim 6, wherein the step of calculating the quantitative analysis value distribution of the unlabeled samples and the quantitative analysis value distribution of the generated samples are: S4-2-1. predicting the quantitative analysis values of the unlabeled samples and the generated samples of the existing training batch by the regressor, which are {circumflex over (z)}unlabeli and {circumflex over (z)}fakei, respectively; andS4-2-2. approximating the distribution of the sample quantitative analysis values to multinomial distribution of k items, approximating {circumflex over (z)}unlabeli and {circumflex over (z)}fakei to k levels, and then counting to obtain probability distribution {circumflex over (p)}unlabel(z) and {circumflex over (p)}fake(z).

Priority Claims (1)

Number	Date	Country	Kind
201910420079.6	May 2019	CN	national

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/CN2020/079710	3/17/2020	WO	00

Semi-supervised Hyperspectral Data Quantitative Analysis Method Based on Generative Adversarial Network

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information