This application claims priority to Chinese Patent Application No. 202110278022.4 with a filing date of Mar. 15, 2021. The content of the aforementioned application, including any intervening amendments thereto, is incorporated herein by reference.
The present disclosure relates to the technical field of wireless sensor networks, and in particular, to a method of hop count matrix recovery based on a decision tree classifier.
The Internet of Things (IoT) is widely used in various fields of society and is important for e-medicine, military monitoring, agricultural production, etc. A wireless sensor network, as the basis of the IoT, contains a large number of small and inexpensive sensor nodes. The sensor nodes are randomly distributed in a monitored area, and the core function of the wireless sensor network is to sense and report data. The observed data is meaningful only if the location of the data generation is known. Therefore, acquiring location information of nodes is the first task after node deployment, which plays a crucial role in understanding the application context. As the traditional Global Positioning System (GPS) requires expensive hardware facilities and high energy consumption, a GPS-based positioning method is not suitable for large-scale wireless sensor networks. In addition, the weak indoor positioning capability of the GPS cannot meet various application scenarios of wireless sensor networks.
Currently, localization solutions for wireless sensor networks can be divided into two major categories: range-based localization solution and non-range-based localization solution. In practical scenarios, a hop count matrix for the non-range-based localization solution is easier to obtain than a distance matrix for the range-based localization solution, because the distance measurement is affected by noise, multipath effects, signal fading or shadowing. In the non-range-based localization solution, the sensor nodes record hop count information of themselves and other nodes during the flooding process, and the hop count matrix is constructed after the flooding is finished. Even if the hop count matrix can be obtained by such a simple method, the hop count matrix observed during the implementation may still contain some missing entries.
It is well known that the sensor nodes have limited energy, and some sensors with monitoring tasks even have energy harvesting capabilities. During the flooding process, the nodes need to continuously send and receive data, which requires high energy consumption. Due to the limited energy of some nodes, in many cases, flooding needs to be terminated before the hop count matrix converges and only some of the hop values in the network can be utilized. In addition, attacks from malicious nodes in the monitored area may contaminate the hop count information, and the malicious nodes may disguise themselves as normal nodes to tamper with forwarded information during the flooding process, which will greatly jeopardize the localization of the nodes. Although many existing solutions can be used to detect malicious nodes and exclude contaminated hop counts, they do not recover these missing hop counts. Therefore, in many cases, only an incomplete hop count matrix is obtained. If an incomplete hop count matrix is used for localization, the localization accuracy will be greatly reduced.
In general, if the obtained hop count matrix contains missing entries, flooding is generally performed again to obtain a new hop count matrix, so as to ensure the localization performance. However, the sensor nodes have limited energy, and the high energy consumption of the flooding process in the non-range-based localization solution is a well-known drawback, and this method results in great energy loss. Therefore, how to recover the hop count matrix without affecting the localization performance and avoid the energy loss caused by the second flooding is an urgent problem to be resolved in the non-range-based localization solution.
In the prior art, for example, in a plain Bayesian-based hop count matrix complementation method [Zhao, Y, Liu, X., Han, F., & Han, G. (2020). Recovery of hop count matrices for the sensing nodes in internet of things. IEEE Internet of Things Journal, 7, 5128-5139], hop counts between nodes in the network are learned, a missing hop count is predicted by using a single feature, and a plain Bayesian classifier is used to learn a relationship between hop counts in the matrix and hop counts of a neighboring node.
However, there are still the following deficiencies in the prior art:
1. A missing value in the hop count matrix is determined by constructing a single feature, which has limited ability to discriminate between classes, and therefore the matrix recovery is not effective.
2. Although some of information between nodes contained in the hop count matrix is missing, relationships between different nodes are still observed, and the observed partial information is not well utilized in the prior art.
3. The performance of the hop count matrix recovery method in the prior art is not verified in non-range-based localization, only a recovery error between the recovered hop count matrix and the original hop count matrix is determined through comparison, which cannot prove the significance of the hop count matrix recovery for IoT applications.
In order to solve the problem of ineffective recovery of missing values in a hop count matrix in the prior art, the present disclosure provides a method of hop count matrix recovery based on a decision tree classifier, which achieves a more accurate prediction result for missing hop counts and greatly improves the recovery capability for the hop count matrix.
To achieve the foregoing objective of the present disclosure, the following technical solution is used: a method of hop count matrix recovery based on a decision tree classifier, including the following steps:
S1: performing a flooding process to acquire a hop count matrix H with missing entries;
S2: constructing a training sample set according to relationships between a part of observed hop counts in the hop count matrix H, and modeling the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents a number of classes;
S3: training a decision tree classifier according to the training sample set obtained in step S2; and
S4: a feature for an unobserved hop count, to obtain an unknown sample; and inputting the unknown sample to the trained decision tree classifier, to obtain a class of the unknown sample which equals to a missing hop count at a corresponding position in the matrix, so as to recover a complete hop count matrix H.
Preferably, a hop count vector of any node i is constructed: hi={hi1, hi2, . . . , hin}, hij represents a hop count between node i and node j, i=1, . . . , n; j=1, . . . , n; and
the hop count matrix H with missing entries is represented by formula (2):
wherein ⊙ represents a Hadamard product; Ω=[ωij]n*n is a binary matrix; ωij indicates whether a hop count at a corresponding position of the hop count matrix is missing, and is expressed as follows:
Preferably, after step S1 and before step S2, if a symmetric position of a missing hop count is observed, the missing hop count is supplemented by using a hop count at the symmetric position.
Further, step S2 specifically includes:
S201: traversing the hop count matrix {tilde over (H)}, and performing a next step if a hop count at a certain position is observed; otherwise, traversing a next value in the hop count matrix {tilde over (H)};
S202: using node i and node/to represent two nodes between which a hop count is missing, and with respect to all other nodes k, k=1, 2, . . . , n in the network, calculating a minimum hop count sum of a hop count {tilde over (h)}ik from node i to node k and a hop count {tilde over (h)}kj from node k to node j, as a first feature of an unknown sample;
S203: calculating an average value of hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i, as a second feature of the training sample;
S204: using the observed hop count as a class, forming the training sample by using the class together with the first feature and second feature, and adding the training sample to the training sample set; and
S205: traversing the entire hop count matrix {tilde over (H)}, and obtaining the training sample set after the traversing is finished.
Further, step S203 specifically includes: initializing two neighbor lists Li and Lj, selecting neighbors of node i according to a hop count vector {tilde over (h)}i, and selecting neighbors of node j according to a hop count vector {tilde over (h)}j; storing indexes of neighboring nodes of node i and indexes of neighboring nodes of node j in the corresponding neighbor lists Li and Lj respectively, where a variable ni represents the number of available neighboring nodes of node i, and a variable nj represents a number of available neighboring nodes of node j; and if a hop count {tilde over (h)}L
wherein f2 represents a second feature of the training sample.
Further, the constructed training sample set is expressed as formula (5):
S=[s1,s2, . . . ,sN]=[(f11,f12,c1);(f21,f22,c2); . . . (fN1,fN2,cN)], (5)
wherein fm1 represents a first feature of an mth training sample, fm2 represents a second feature of the mth training sample, and cm∈{1, 2, . . . , hmax} represents a class of the mth training sample, m=1, 2, . . . , N, and hmax is a maximum hop count.
Further, step S4 specifically includes the following sub-steps:
S401: if the hop count at a certain position in the hop count matrix {tilde over (H)} is not observed, performing a next step; otherwise, traversing a next value in the hop count matrix {tilde over (H)};
S402: using node i and node j to represent two nodes between which a hop count is missing, and with respect to all other nodes k, k=1, 2, . . . , n in the network, calculating a minimum sum of a hop count {tilde over (h)}ik from node i to node k and a hop count {tilde over (h)}kj from node k to node j, as a first feature of the unknown sample;
S403: calculating an average value of hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i, as a second feature of the unknown sample;
S404: forming the unknown sample by using the constructed first feature and second feature of the unknown sample, inputting the unknown sample to the trained decision tree classifier to obtain the class of the unknown sample, that is, obtain the hop count at the position; and
S405: if traversing of the hop count matrix {tilde over (H)} is not finished, traversing a next value in the matrix and returning to step S401; and if traversing of the hop count matrix {tilde over (H)} is finished, obtaining the recovered hop count matrix.
Further, the constructed training sample set is expressed following:
wherein min({tilde over (h)}ik+{tilde over (h)}kj) represents a minimum sum of a hop count {tilde over (h)}ik from node i to node k and a hop count {tilde over (h)}kj from node k to node j.
A computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the foregoing method when executing the computer program.
A computer-readable storage medium storing a computer program is provided, where the steps of the foregoing method are implemented when the computer program is executed by a processor.
The present disclosure has the following beneficial effects.
The present disclosure makes full use of observed information in a hop count matrix with missing entries, multi-dimensional features are constructed, and prediction results of missing hop counts are more accurate based on the joint analysis of the multi-dimensional features, thereby greatly improving the recovery capability for the hop count matrix.
The present disclosure is described in detail below with reference to the accompanying drawings and embodiments.
As shown in
S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
S3: Train a decision tree classifier according to the training sample set obtained in step S2.
S4: Construct a feature for an unobserved hop count, to obtain an unknown sample; and inputting the unknown sample to the trained decision tree classifier, to obtain a class of the unknown sample which equals to a missing hop count at a corresponding position in the matrix, so as to recover a complete hop count matrix H.
In a specific embodiment, as shown in
Through a flooding process, every node in the network can obtain minimum hop counts of other nodes. A hop count vector of any node i is constructed: hi={hi1, hi2, . . . , hin}, hij represents a hop count between node i and node j; i=1, . . . , n; j=1, . . . , n. A hop count matrix is constructed by using hop count vectors of all the nodes, as shown in formula (1):
However, in the flooding process, due to node failures or attacks from malicious nodes, some entries in the hop count matrix H are missing, where the hop count matrix with missing entries is expressed as formula (2):
wherein ⊙ represents a Hadamard product; Ω=[ωij]n*n is a binary matrix; indicates whether a hop count at a corresponding position of the hop count matrix is missing, and is expressed as follows:
The objective of the method in this embodiment is to obtain the original hop count matrix H according to the matrix {tilde over (H)} with missing entries, to model a recovery problem of the hop count matrix as a classification problem.
In a specific embodiment, after step S1 and before step S2, in order to increase the number of training samples, if a symmetric position of a missing hop count is observed, the missing hop count is supplemented by using a hop count at the symmetric position. That is, if one of the hop count {tilde over (h)}ij from node i to node j and the hop count {tilde over (h)}ji from node j to node i is observed, the hop count at the other position is supplemented by the hop count at the symmetric position.
In a specific embodiment, the training sample set is initialized to be null. To construct the training sample set, the hop count matrix needs to be traversed. When the hop count {tilde over (h)}ij between node i and node j is not equal to 0, it indicates that the hop count between node i and node j is observed, and the hop count is used as a label of a training sample.
In a specific embodiment, for the recovery of the hop count matrix, the only available information is partial observed entries in the incomplete hop count matrix. Therefore, in this embodiment, the training sample set is constructed by using the observed entries, and topological information in the network is used to train the decision tree classifier. The training sample set is constructed by using relationships between hop counts in the network. The hop counts in the hop count matrix are modeled as labels of the training sample set, and a maximum hop count hmax in the network represents the number of classes, that is, the hop count in the network is an integer from 1 to {tilde over (h)}max.
Step S2 specifically includes the following sub-steps:
S201: Initialize the training sample set to be null; in order to construct the training sample set, traverse the hop count matrix {tilde over (H)}, and when the hop count {tilde over (h)}ij between node i and node j is not equal to 0, which indicates that the hop count between node i and node j is observed, use the hop count is used as a label of a training sample, and perform a next step; otherwise, traverse a next value in the hop count matrix {tilde over (H)}.
S202: According to features of the hop count matrix, calculate hop counts of node i and node j with respect to other nodes k in the network, as shown in formula (3):
h
ij=min(hik+hkj),k=1,2, . . . ,n,k≠i,k≠j (3)
For a hop count matrix without missing entries, when hij≠1, hij=min(hik+hkj), k=1, 2, . . . , n, k≠i, k≠j. However, for a hop count matrix with missing entries, the relationship between {tilde over (h)}ij and min({tilde over (h)}ik+{tilde over (h)}kj) is not clear. Therefore, the relationship between the two can be learned through a decision tree classifier.
Therefore, for all other nodes k, k=1, 2, . . . N in the network, if the hop count {tilde over (h)}ik from node i to node k and the hop count {tilde over (h)}kj from node k to node j are observed, a sum of the two hop counts is calculated. All the nodes in the network are traversed to obtain a minimum value as a first feature of a training sample:
f
1=min({tilde over (h)}ik+{tilde over (h)}kj).
S203: Due to a limited prediction capability of a single feature, construct neighbor information of a node pair as features, to jointly classify an unknown hop count. The hop count between node i and node j is closely related to hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i. Because the neighboring nodes are distributed around the node, values of the hop counts from the neighboring nodes of node i to node j can be [{tilde over (h)}ij−1, {tilde over (h)}ij, {tilde over (h)}ij+1]. Therefore, when the node distribution is even enough and no neighboring node is missing, the hop count between node i and node j can be obtained by calculating the hop counts from the neighboring nodes of node i to node j. However, the node distribution in the network is not completely even and some neighboring nodes may be missing, their relationship also needs to be learned by using the decision tree classifier. An average value of the hop counts from the neighboring nodes of node i to node j and the hop counts from the neighboring nodes of node j to node i is calculated to serve as a second feature of the training sample.
Specifically, two neighbor lists Li and Lj are initialized, neighbors of node i are selected according to a hop count vector {tilde over (h)}i, and neighbors of node j are selected according to a hop count vector {tilde over (h)}j. Indexes of neighboring nodes of node i are stored in the neighbor list Li, and indexes of neighboring nodes of node j are stored in the neighbor list Lj. Variables ni and nj respectively represent the numbers of available neighboring nodes of node i and node j. If a hop count {tilde over (h)}L
The second feature of the training sample is calculated as shown in equation (4):
S204: Use the observed hop count as a class, and form a training sample by using the class together with the constructed first feature and second feature of the sample, where the training sample is expressed as follows:
The training sample is added to the training sample set.
S205: Traverse the hop count matrix {tilde over (H)}, and obtain the training sample set after the traversing is finished, where the training sample set is expressed as formula (5):
S=[s1,s2, . . . ,sN]=[(f11,f12,c1);(f21,f22,c2); . . . (fN1,fN2,cN)], (5)
wherein fm1 represents a first feature of an mth training sample, fm2 represents a second feature of the mth training sample, and cm∈{1, 2, . . . , hmax} represents a class of the mth training sample; m=1, 2, . . . , N, and N is the number of training samples.
The decision tree classifier is trained by using the obtained training sample set.
In a specific embodiment, a feature is constructed for each unobserved hop count, to obtain unknown samples; and the hop count matrix {tilde over (H)} is traversed again, to find missing hop counts in the hop count matrix {tilde over (H)}. Specifically:
S401: If a hop count at a certain position in the hop count matrix {tilde over (H)} is not observed, perform a next step; otherwise, traverse a next value in the hop count matrix {tilde over (H)}.
S402: For each missing hop count, construct a first feature of an unknown samples, and with respect to all other nodes k, k=1, 2, . . . N in the network, if a hop count {tilde over (h)}ik from node i to node k and a hop count {tilde over (h)}kj from node k to node j are observed, calculate a sum of the two hop counts. All the nodes in the network are traversed to obtain a minimum value as a first feature of the unknown sample:
f′
1=min({tilde over (h)}ik+{tilde over (h)}kj).
S403: Initialize two neighbor lists Li and Lj, select neighbors of node i according to a hop count vector {tilde over (h)}i, and select neighbors of node j according to a hop count vector {tilde over (h)}j. Indexes of neighboring nodes of node i and indexes of neighboring nodes of node j are stored in the neighbor lists Li and Lj respectively. Variables ni and nj respectively represent the numbers of available neighboring nodes of node i and node j. If a hop count {tilde over (h)}L
S404: Form an unknown sample by using the constructed first feature and second feature of the unknown sample, where the unknown sample is expressed as follows:
The unknown sample is inputted to the trained decision tree classifier to obtain a class of the unknown sample, that is, obtain the hop count at the position.
S405: If traversing of the hop count matrix {tilde over (H)} is not finished, traverse a next value in the matrix and return to step S401; and if traversing of the hop count matrix {tilde over (H)} is finished, obtain the recovered hop count matrix.
In this embodiment, features are constructed for all missing values in the hop count matrix {tilde over (H)}, and then are classified by using the decision tree classifier, to obtain the hop count at each position with the missing value, thereby obtaining the recovered complete hop count matrix.
An application scenario of this embodiment is described below, to reflect the advantage of the method in this embodiment over other methods. However, various parameters in the experiments are not set fixedly. The performance of this embodiment is compared with the performance of a method of hop count matrix recovery based on naive Bayes classifier (HCMR-NBC) and the performance of a singular value thresholding (SVT) method. For ease of description, the contrast solutions and the method of hop count matrix recovery based on decision tree (HCMR-DT) in this embodiment are all denoted by abbreviations in the figures. Normalized root mean square errors (NRMSEs) of the recovered hop count matrix
A 100×100 square monitored area containing 20 anchor nodes is considered, where the nodes are randomly distributed in the monitored area. The communication radii of the anchor nodes and unknown nodes are set to 20. In the experiments, effects of observation ratio, matrix dimension (obtained by varying the node density) and network topology (specifically, uniform network, O-type network and S-type network) on the performance of the algorithm were considered. In addition, to reduce random errors, all the experiments were repeated 100 times to obtain final results.
Effects of different matrix dimensions on the reconstruction performance of the algorithm were first compared experimentally.
As can be learned from
This embodiment also compares the effect of observation ratio on the reconstruction performance of the algorithm, and
In addition, the adaptability of the method described in this embodiment for different networks is also proved. Comparison was performed carried out on topological networks with uniform node distribution, S-type node distribution and O-type node distribution.
The hop count matrix recovered by using the method proposed in this embodiment has a wide range of applications in the field of IoT. In order to verify the significance of hop count recovery for IoT applications, the effectiveness of hop count matrix recovery is verified here with localization as a practical scenario. The classical DV-hop algorithm is used as a localization algorithm. Hop count matrices recovered by using the solution of this embodiment and other two contrast solutions, and the hop count matrix without missing entries are used for localization at the same time, and an average localization error of unknown nodes is evaluated using the NRMSE of localization, which is calculated as shown in equation (8):
wherein ({circumflex over (x)}k, ŷk) represents an estimated position of an unknown node k, (xk,yk) represents a real position of the unknown node k. Nu represents the number of unknown nodes, and R is the communication radius of each node.
It can be seen that the proposed method in this embodiment maintains the best localization performance all the time. In addition, the differences of location performance of the recovered hop count matrix at different matrix dimensions with an observation ratio of 40%. With reference to
A computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the following method steps when executing the computer program:
S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
S3: Train a decision tree classifier according to the training sample set obtained in step S2.
S4: Construct features for unobserved hop counts, to obtain unknown samples.
S5: Input the unknown samples to the trained decision tree classifier, to obtain classes of the unknown samples, that is, obtain missing hop counts at corresponding positions in the matrix, to recover a complete hop count matrix H.
A computer readable storage medium storing a computer program is provided, where the following method steps are implemented when the computer program is executed by a processor:
S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
S3: Train a decision tree classifier according to the training sample set obtained in step S2.
S4: Construct features for unobserved hop counts, to obtain unknown samples.
S5: Input the unknown samples to the trained decision tree classifier, to obtain classes of the unknown samples, that is, obtain missing hop counts at corresponding positions in the matrix, to recover a complete hop count matrix H.
It is apparent that the above embodiments are merely intended to describe the present disclosure clearly, rather than to limit the implementations of the present disclosure. Any modification, equivalent substitution and improvement made within the spirit and principle of the present disclosure should fall within the protection scope of the claims of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202110278022.4 | Mar 2021 | CN | national |