CROSS-REFERENCE TO THE RELATED APPLICATIONS
This application is based upon and claims priority to Chinese Patent Application No. 202311498055.5, filed on Nov. 13, 2023, the entire contents of which are incorporated herein by reference.
TECHNICAL FIELD
The present disclosure relates to the technical field of electrocardiogram (ECG) signal classification, and in particular to a few-shot ECG signal classification method based on an improved Siamese network.
BACKGROUND
In recent years, deep learning (DL)-based algorithm models have achieved unprecedented success in big data (BD) processing in the field of artificial intelligence (AI). However, due to the rarity and large individual differences of certain types of arrhythmias the acquired data is limited, which limits the generalization ability and accuracy of existing models. Few-shot learning is mainly used for neural network classifiers, which only requires a small number of samples for learning and training, and can achieve efficient recognition and classification of electrocardiogram (ECG) signals.
SUMMARY
In order to overcome the above-mentioned shortcomings in the prior art, the present disclosure provides a few-shot electrocardiogram (ECG) signal classification method based on an improved Siamese network, which can improve the classification accuracy.
In order to solve the technical problem, the present disclosure adopts the following technical solution.
The few-shot ECG signal classification method based on an improved Siamese network includes the following steps:
- a) acquiring n original ECG signals to form an original ECG signal set D, D={(x1, y1), (x2, y2), . . . , (xi, yi), . . . , (xn, yn)}, where xi denotes an i-th original ECG signal, and yi denotes a class label corresponding to the i-th original ECG signal xi, i∈{1, . . . , n};
- b) preprocessing the original ECG signal set D to remove noise in the original ECG signals, thereby acquiring a clean ECG signal set D′, D′={(x′1, y1), (x′2, y2), . . . , (x′i, yi), . . . , (x′n, yn)}, where x′i denotes an i-th clean ECG signal;
- c) normalizing the i-th clean ECG signal x′i to acquire a normalized ECG signal x″i; and performing zero-padding in the end of a sequence of the normalized ECG signal xi″ if a length of the sequence of the normalized ECG signal xi″ is less than Lmax, such that the length of the sequence of the normalized ECG signal x″ is equal to Lmax, thereby acquiring a normalized ECG signal set D″, D″={(x″1, y″1), (x″2, y2), . . . , (x″i, yi), . . . , (x″n, yn)}
- d) creating a sample pair set P based on the normalized ECG signal set D″,
yi−1 denotes a class label corresponding to the (i−1)-th original ECG signal xi−1; and there are M sample pairs in the sample pair set P,
- e) constructing a few-shot classification model, and inputting a sample pair ((x″1, x″i+1), Y′) from the sample pair set P into the few-shot classification model to acquire a similarity score Ew(x″i, x″i+1);
- f) training, by an adaptive moment estimation (Adam) optimizer, the few-shot classification model through a loss function L to acquire an optimized few-shot classification model;
- g) randomly sampling K ECG signals from each of N classes in a Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) dataset to form a support set ssupport, ssupport={(s1, a1), (s2, a2), . . . , (si, ai), . . . , (sNK, aNK)}, where si denotes an i-th ECG signal, and ai denotes a class label corresponding to the i-th ECG signal si, i∈{1, . . . , NK};
- h) randomly sampling Q ECG signals from each of the N classes in the MIT-BIH dataset to form a query set squery, squery={(q1, b1), (q2, b2), . . . , (qi, bi), . . . , (qNQ, bNQ)}, where qi denotes an i-th ECG signal, and b; denotes a class label corresponding to the i-th ECG signal qi, i∈{1, . . . , NQ};
- i) replacing the i-th original ECG signal xi with the i-th ECG signal si, and repeating the steps b) and c) to acquire an i-th normalized ECG signal s″i, thereby acquiring a normalized support set s′support, s″support={(s″1, a1), (s″2, a2), . . . , (s″i, ai), . . . , (s′NK, aNK)}; and replacing the i-th original ECG signal xi with the i-th ECG signal qi, and repeating the steps b) and c) to acquire an i-th normalized ECG signal q″i, thereby acquiring a normalized query set s″query, s″query={(q″1, b1), (q″2, b2), . . . , (q″i, bi), . . . , (q″NQ, bNQ)}; and
- j) inputting the i-th normalized ECG signal s″i and the i-th normalized ECG signal q″i into the optimized few-shot classification model to acquire a classification result.
Further, the step a) includes: acquiring the n original ECG signals from a University of California Riverside (UCR) dataset.
Further, the step b) includes: denoising, by a first median filter and a second median filter in sequence, the i-th original ECG signal xi to acquire the i-th clean ECG signal x′i.
Preferably, the first median filter has a width of 300 ms, and the second median filter has a width of 600 ms.
Preferably, Lmax=187.
Further, the step e) includes:
- e-1) constructing the few-shot classification model, including an embedding module and a metric module;
- e-2) constructing the embedding module of the few-shot classification model, where the embedding module includes a Siamese network formed by a first CMP module and a second CMP module; the first CMP module includes a convolutional layer, a first rectified linear unit (ReLU) activation function layer, a primary capsule layer of a capsule network, a digital capsule layer of the capsule network, a first fully connected layer, a second ReLU activation function layer, and a second fully connected layer; and the second CMP module includes a convolutional layer, a first ReLU activation function layer, a primary capsule layer of a capsule network, a digital capsule layer of the capsule network, a first fully connected layer, a second ReLU activation function layer, and a second fully connected layer;
- e-3) inputting the i-th normalized ECG signal x″1 into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire a feature f11; inputting the feature f11 into the primary capsule layer of the capsule network in the first CMP module to acquire a vector f12; inputting the vector f12 into the digital capsule layer of the capsule network in the first CMP module to acquire a feature f13; inputting the feature f13 into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire a feature f14; and inputting the feature f14 into the second fully connected layer of the first CMP module to acquire a feature f(x″i);
- e-4) inputting the (i+1)-th normalized ECG signal x″i+1 into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire a feature f21; inputting the feature f21 into the primary capsule layer of the capsule network in the first CMP module to acquire a vector f22; inputting the vector f22 into the digital capsule layer of the capsule network in the first CMP module to acquire a feature f23; inputting the feature f23 into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire a feature f24; and inputting the feature f24 into the second fully connected layer of the first CMP module to acquire a feature f(x″1+1); and
- e-5) inputting the feature f(x″i) and the feature f(x″i+1) into the metric module of the few-shot classification model, and calculating the similarity score Ew(x″1, x″i+1) by Ew(x″i, x″i+1)=∥f(x″i)−f(x″i+1)∥, where ∥●∥ denotes a Euclidean distance (ED) calculation.
Preferably, in the step e-2), the convolutional layer of the first CMP module includes a 3×3 convolution kernel, and the convolutional layer of the second CMP module includes a 3×3 convolution kernel.
Further, the step f) includes: calculating the loss function
m denotes a hyperparameter, α denotes a hyperparameter; and L2 denotes a cross entropy loss function.
Further, the step j) includes:
- j-1) inputting the i-th normalized ECG signal s″i of a u-th class into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire a feature f31, u∈{1, . . . , N}; inputting the feature f31 into the primary capsule layer of the capsule network in the first CMP module to acquire a vector f32; inputting the vector f32 into the primary capsule layer of the capsule network in the first CMP module to acquire a feature f33; inputting the feature f33 into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire a feature f34; inputting the feature f34 into the second fully connected layer of the first CMP module to acquire a feature f(s″i)u; and calculating, by a mean( ) function in Python, an average of all K features f(s″1)u, f(s″2)u, . . . , f(s″i)u, . . . , f(s″K)u, of the u-th class to acquire a feature vector μu;
- j-2) inputting the i-th normalized ECG signal q″i into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire a feature f41; inputting the feature f41 into the primary capsule layer of the capsule network in the first CMP module to acquire a vector f42; inputting the vector f42 into the primary capsule layer of the capsule network in the first CMP module to acquire a feature f43; inputting the feature f43 into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire a feature f44; and inputting the feature f44 into the second fully connected layer of the first CMP module to acquire a feature f(q″i);
- j-3) inputting the feature vector μu and the feature f(q″i) into the metric module of the few-shot classification model, and calculating the similarity score Ew(μu, f(q″i)) by Ew(μu, f(q″i))=∥μu−f(q″i)∥; and
- j-4) calculating a class label ŷi of the i-th normalized ECG signal q″i by {right arrow over (y)}i=arg max {Ew(μ1, f(q″i)), Ew(μ2, f(q″i)), . . . , Ew(μu, f(q″i)), . . . , Ew(μN, f(q″i))}, and combining class labels of all NQ normalized ECG signals to form the classification result.
The present disclosure has the following beneficial effects. The present disclosure constructs the CMP module as a sub-network of the Siamese network, and combines the extracted local and global features to better analyze peak information such as position, amplitude, and offset, making the transformed feature vector more robust. In this way, the present disclosure improves the accuracy and stability of few-shot ECG signal classification.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a flowchart of a few-shot ECG signal classification method based on an improved Siamese network according to the present disclosure;
FIG. 2 is a structural diagram of a CMP module according to the present disclosure;
FIG. 3 shows a comparison of average accuracy and K for different models according to the present disclosure;
FIGS. 4A-4B show a comparison of confusion matrices for models in 3-way 10-shot according to the present disclosure; and
FIGS. 5A-5F show a comparison between true and predict labels in 3-way 10-shot.
Table 1 Average accuracy comparison results of models in the present disclosure
Table 2 Average precision, average recall, and average F1 score comparison results of different models in the present disclosure
DETAILED DESCRIPTION OF THE EMBODIMENTS
The present disclosure is further described with reference to FIG. 1 and FIG. 2.
The few-shot ECG signal classification method based on an improved Siamese network includes the following steps:
- a) n original ECG signals are acquired to form original ECG signal set D, D={(x1, y1), (x2, y2), . . . , (xi, yi), . . . , (xn, yn)}, where xi denotes i-th original ECG signal, and yi denotes a class label corresponding to the i-th original ECG signal xi, i∈{1, . . . , n}.
- b) The original ECG signal set D is preprocessed to remove noise in the original ECG signals, thereby acquiring clean ECG signal set D′, D′={(x′1, y1), (x′2, y2), . . . , (x′i, yi), . . . , (x′n, yn)}, where x′i denotes i-th clean ECG signal.
- c) The i-th clean ECG signal x′i is normalized to acquire normalized ECG signal x″i; and performing zero-padding is performed in the end of a sequence of the normalized ECG signal x″i if a length of the sequence of the normalized ECG signal x″i is less than Lmax, such that the length of the sequence of the normalized ECG signal x″i is equal to Lmax, thereby acquiring normalized ECG signal set D″, D″={x″1, y1), (x″2, y2), . . . , (x″i, yi), . . . , (x″n, yn)}.
- d) Sample pair set P is created based on the normalized ECG signal set D″,
yi−1 denotes a class label corresponding to the (i−1)-th original ECG signal x1-1; and there are M sample pairs in the sample pair set P,
- e) A few-shot classification model is constructed, and sample pair (x″i, x″1-1),Y″) from the sample pair set P is input into the few-shot classification model to acquire similarity score Ew(x″i, x″i+1)
- f) The few-shot classification model is trained by an adaptive moment estimation (Adam) optimizer through loss function L to acquire an optimized few-shot classification model.
- g) K ECG signals are randomly sampled from each of N classes in a Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) dataset to form support set ssupport, ssupport={(s1, a1), (s2, a2), . . . , (si, ai), . . . , (sNK, aNK)}, where si denotes i-th ECG signal, and ai denotes a class label corresponding to the i-th ECG signal si, i∈{1, . . . , NK}.
- h) Q ECG signals are randomly sampled from each of the N classes in the MIT-BIH dataset to form query set squery for the purpose of accurately classifying NQ queries based on given NK samples, squery={(q1, b1), (q2, b2), . . . , (qi, bi), . . . , (qNQ, bNQ)}, where qi denotes i-th ECG signal, and b; denotes a class label corresponding to the i-th ECG signal qi, i∈{1, . . . , NQ}.
i) The i-th original ECG signal xi is replaced with the i-th ECG signal si, and the steps b) and c) are repeated to acquire i-th normalized ECG signal s″i, thereby acquiring normalized support set s″support, s″support={(s″1, a1), (s″2, a2), . . . , (s″i, ai), . . . , (s″NK, aNK)}. The i-th original ECG signal xi is replaced with the i-th ECG signal qi, and the steps b) and c) are repeated to acquire i-th normalized ECG signal q″i, thereby acquiring normalized query set query s″query, s″query={(q″1, b1), (q″2, b2), . . . , (q″i, bi), . . . , (q″NQ, BNQ)}.
- j) The i-th normalized ECG signal s″i and the i-th normalized ECG signal q″i are input into the optimized few-shot classification model to acquire classification result.
The present disclosure provides a brand new CMP module to establish the Siamese network for few-shot ECG signal classification, which improves classification accuracy.
In an embodiment of the present disclosure, in the step a), the n original ECG signals are acquired from a University of California Riverside (UCR) dataset.
In an embodiment of the present disclosure, in the step b), the i-th original ECG signal xi is denoised by a first median filter and a second median filter in sequence to acquire the i-th clean ECG signal x′i. In the embodiment, preferably, the first median filter has a width of 300 ms, and the second median filter has a width of 600 ms.
In an embodiment of the present disclosure, Lmax=187.
In an embodiment of the present application, the step e) is as follows.
- e-1) The few-shot classification model is constructed, including an embedding module and a metric module.
- e-2) The embedding module of the few-shot classification model is constructed, where the embedding module includes a Siamese network formed by a first CMP module and a second CMP module; the first CMP module includes a convolutional layer, a first rectified linear unit (ReLU) activation function layer, a primary capsule layer of a capsule network, a digital capsule layer of the capsule network, a first fully connected layer, a second ReLU activation function layer, and a second fully connected layer; and the second CMP module includes a convolutional layer, a first ReLU activation function layer, a primary capsule layer of a capsule network, a digital capsule layer of the capsule network, a first fully connected layer, a second ReLU activation function layer, and a second fully connected layer.
- e-3) The i-th normalized ECG signal x″i is input into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to extract low-level feature of the ECG signal x″i, thereby acquiring feature f11. The feature f11 is input into the primary capsule layer of the capsule network in the first CMP module for a feature-to-vector transformation, thereby acquiring vector f12. The vector f12 is input into the digital capsule layer of the capsule network in the first CMP module, and the vector f12 is subjected to matrix transformation, input weighting, summation, and non-linear transformation to acquire feature f13. The feature f31 is input into the zero-neuron first fully connected layer and second ReLU activation function layer of the first CMP module in sequence for nonlinear mapping to acquire feature f14. The feature f14 is input into the second fully connected layer of the first CMP module, and an embedding vector mapped from the first fully connected layer to a 0-dimensional space outputs an embedding vector with a same dimension as an input dimension to acquire feature f(x″i).
- e-4) The (i+1)-th normalized ECG signal x″i+1 is input into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to extract low-level feature of the ECG signal x″i+1, thereby acquiring feature f21. The feature f21 is input into the primary capsule layer of the capsule network in the first CMP module for a feature-to-vector transformation, thereby acquiring vector f22. The vector f22 is input into the digital capsule layer of the capsule network in the first CMP module, and the vector f22 is subjected to matrix transformation, input weighting, summation, and non-linear transformation to acquire feature f23. The feature f23 is input into the zero-neuron first fully connected layer and second ReLU activation function layer of the first CMP module in sequence for nonlinear mapping to acquire feature f24. The feature f24 is input into the second fully connected layer of the first CMP module, and an embedding vector mapped from the first fully connected layer to a 0-dimensional space outputs an embedding vector with a same dimension as an input dimension to acquire feature f(x″i+1).
- e-5) The feature f(x″i) and the feature f(x″i+1) are input into the metric module of the few-shot classification model, and the similarity score is Ew(x″i, x″i+1) calculated by Ew(x″1, x″i+1)=∥f(x″i)−f(x″i+1)∥, where ∥●∥ denotes a Euclidean distance (ED) calculation.
In the embodiment, in the step e-2), the convolutional layer of the first CMP module includes a 3×3 convolution kernel, and the convolutional layer of the second CMP module includes a 3×3 convolution kernel.
In the step f), the loss function L is calculated by L=L1+αL2, where L1 is designed to adjust the loss function of the Siamese network.
where m denotes a hyperparameter; α denotes a hyperparameter; and L2 denotes a cross entropy loss function. Further, α=5, m=5. The total loss L takes into account both sample distance and feature classification.
The step j) is as follows.
- j-1) The i-th normalized ECG signal s″i of a W-th class is input into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire feature f31, u∈{1, . . . , N}. The feature f31 is input into the primary capsule layer of the capsule network in the first CMP module to acquire vector f32. The vector f32 is input into the primary capsule layer of the capsule network in the first CMP module to acquire feature f33. The feature f33 is input into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire feature f34. The feature f34 is input into the second fully connected layer of the first CMP module to acquire feature f(s″i)u. An average of all K features f(s″1)u, f(s″2)u, . . . , f(s″i), . . . , f(s″K)u of the W-th class is calculated by a mean( ) function in Python to acquire feature vector μu.
- j-2) The i-th normalized ECG signal q″i is input into the convolutional layer and the first ReLU activation function layer of the first CMP module in sequence to acquire feature f41. The feature f41 is input into the primary capsule layer of the capsule network in the first CMP module to acquire vector f42. The vector f42 is input into the primary capsule layer of the capsule network in the first CMP module to acquire feature f43. The feature f43 is input into the first fully connected layer and the second ReLU activation function layer of the first CMP module in sequence to acquire feature f44. The feature f44 is input into the second fully connected layer of the first CMP module to acquire feature f(q″i).
- j-3) The feature vector μu and the feature f(q″i) are input into the metric module of the few-shot classification model, and the similarity score is Ew(μu, f(q″i)) is calculated by Ew(μu, f(q″i))=∥μu−f(q″i)∥.
- j-4) Class label ŷi of the i-th normalized ECG signal q″i is calculated by {right arrow over (y)}i=arg max {Ew(μ1, f(q″i)), Ew(μ2, f(q″i)), . . . , Ew(μu, f(q″i)), . . . , Ew(μN, f(q″i))}, and class labels of all NQ normalized ECG signals are combined in to the classification result.
Taking the publicly available MIT-BIH dataset as an example, the implementation of the present disclosure is explained in detail below.
The model proposed by the present disclosure is compared with mainstream classification task models (ED, dynamic time warping (DTW), long short-term memory-fully connected network (LSTM-FCN)) and a Siamese convolutional neural network (SCNN) model, and the final accuracy is the average of 20 tasks. Accuracy, precision, recall, and F1 score are used as evaluation indicators.
The training is performed based on UCR ECG200 and ECG5000 datasets, the validation is performed based on UCR TwoLeadECG and ECGFiveDays datasets, and the model testing is performed based on the MIT-BIH dataset. FIG. 3 shows a comparison of the relationship between the average accuracy and K for different models. It can be seen from the figure that as K increases, ED almost monotonically increases, and the precision, recall, and F1 score also increase. DTW does not follow such a smooth behavior and offers poorer performance than ED at a smaller K value. However, DTW outperforms ED at a value close to 50 and may perform better at a larger value. Unlike ED and DTW, FCN-LSTM exhibits an extremely irregular behavior during training, with a significant fluctuation in accuracy in certain areas, which can be attributed to the randomness of neural network optimization and the lack of labeled data for training. The comparison between the model of the present disclosure and the SCNN model shows that the accuracy does not increase sharply from K=1 to K=50, but tends to stabilize around 0.93, and the recall, precision, and F1 score also tend to stabilize around 0.93.
FIGS. 4A-4B show a confusion matrix of the CMP model in 3-way 10-shot on the MIT-BIH dataset. It can be seen from the figure that the model of the present disclosure has better comprehensive performance and lower misdiagnosis rate during the evaluation process. FIGS. 5A-5F show changes in true and predict labels of 6 randomly selected signals during 3-way 10-shot (N, S and V are represented by 0, 1 and 2, respectively). Table 1 shows comparison results of accuracy acquired by different models under different K values on the MIT-BIH dataset, while Table 2 shows comparison results of average precision, average recall, and average F1 score of different models on the MIT-BIH dataset. In summary, from the perspective of model performance, the model of the present disclosure can effectively distinguish between acceptable and unacceptable ECG signals in practical environments.
Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art may still modify the technical solutions described in the foregoing embodiments, or equivalently substitute some technical features thereof. Any modification, equivalent substitution, improvement, etc. within the spirit and principles of the present disclosure shall fall within the scope of protection of the present disclosure.