This application is a national stage application under 35 U.S.C. 371 of PCT Application No. PCT/CN2019/076718, filed on 1 Mar. 2019, which PCT application claimed the benefit of Chinese Patent Application No. 2018113384154, filed on 9 Nov. 2018, the entire disclosure of each of which are hereby incorporated herein by reference.
The present disclosure relates to the field of mobile communication, and in particular, to a method for measuring an antenna downtilt angle based on a multi-scale deep semantic segmentation network.
Nowadays, in the era of network information, the quality of mobile communication networks is extremely important. In GSM-R construction and planning, as shown in
There are two traditional methods to measure the antenna downtilt angle: the first one is climbing to an antenna base station manually and using a measuring instrument (a compass, a slope meter, or the like) for measurement; and the second one is installing an angle sensor on the antenna to return data. The antenna is susceptible to wind, snow and other factors, resulting in a change in the downtilt angle, so it needs to be measured regularly. For the first method, as the base station is high and the number of the antennas is larger, the manual safety hazard and workload are larger, and the practicability is low. For the second method, the installation time is long, and the antenna models are different, so the installation cost of the instruments is high and the practicability is not high. Both the two methods consume a lot of manpower and material resources and are not suitable for large-scale measurement today.
To solve the above problems, the present disclosure aims at providing a method for measuring an antenna downtilt angle based on a multi-scale deep semantic segmentation network. The method for measuring a downtilt angle of a mobile base station antenna by calling a target detection algorithm and a semantic segmentation algorithm and using an unmanned aerial vehicle as a carrier is highly applicable, cost-effective, and safe.
The technical scheme adopted by the present disclosure to solve the problems is as follows:
An antenna downtilt angle measuring method based on a multi-scale deep semantic segmentation network, including:
collecting image data: base station antenna data is collected by using an unmanned aerial vehicle and antenna images collected are taken as a data set;
predicting a target bounding box: a target antenna in the data set is positioned, and a bounding box is predicted by logistic regression;
performing target recognition and semantic segmentation: target features of the target antenna in the data set are extracted, the target features are learned and processed by an activation function, a target image is output for semantic image segmentation, and pixel points of the target image and the background are classified; and
calculating an antenna downtilt angle: the width and height of an antenna box are obtained according to a border of the target image to calculate the antenna downtilt angle.
Further, the collecting image data includes:
locating the unmanned aerial vehicle on the top of a pole of a base station antenna, and recording the longitude and latitude (L0, W0) of the pole in the vertical direction; causing the unmanned aerial vehicle to fly around a point of the base station antenna, setting a flight radius of the unmanned aerial vehicle, and the unmanned aerial vehicle moving around the pole along the radius on the same horizontal plane to acquire antenna images with different attitudes and angles of a mobile base station antenna as a data set.
Further, the predicting a target bounding box includes:
positioning a target antenna in the antenna image, predicting a bounding box by logistic regression, first dividing the entire antenna image into N*N grids, predicting the entire antenna image after the antenna image is input, scanning each grid at a time, and starting to predict the target antenna when the center of the grid where the target antenna is located is positioned, wherein 4 coordinate values predicted for each bounding box are tx, ty, tw, and th, respectively, an upper-left offset of each target cell is (cx, cy), box heights of prior bounding boxes are px, py respectively, and the network predicts their values as:
bx=σ(tx)+cx (1)
by=σ(ty)+cy (2)
bw=pwet
bh=phet
where σ(·) denotes the activation function, which can be expressed as:
σ(x)=1/1+e−x
where pw, ph denote the width and height of the prior bounding boxes respectively, e denotes the natural constant, which is about equal to 2.71828;
where bx, by, bw, bh can be calculated according the above formulas, wherein bw and bh denote the width and the height of the bounding boxes respectively,
where the input antenna image is divided into N*N grids, each grid includes five predictors (x, y, w, h, confidence) and a c class, and the output of the network is of a size of S*S*(5*B+C); B is the number of the bounding boxes in each grid, C means the class is only antenna in the present disclosure, and thus is 1, and confidence represents that the predicted grid includes two pieces of information, i.e., confidence of the target antenna and prediction accuracy of the bounding box:
confidence=Pr(object)*IOUprdtruth (5)
where IOUprdtruth denotes Intersection over Union between the bounding boxes and the prior bounding boxes, and where a threshold is set to 0.5 when Pr(Object)=1; the target antenna falls in the center of the grid, that is, the bounding box currently predicted coincides with an actual background box object better than before; if the predicted bounding box is not the best currently, the bounding box is not predicted when the threshold is smaller than 0.5, and it is determined that the target antenna does not fall into the grid.
Further, the performing target recognition and semantic segmentation includes:
performing target recognition by using a network convolutional layer for feature extraction: antenna image pixel 416*416 is input, the channel number is 3, there are 32 layers of convolution kernels, each kernel has a size of 3*3, 32 layers of convolution kernels are used to learn 32 feature maps, and for color differences of the target antenna, features of the target antenna are learned by using different convolution kernels; convolutional layer up-sampling is performed during feature extraction, and a prediction formula for object classes is as follows:
Pr(Classi|object)*Pr(object)*Pr(object)*IOUpredtruth=Pr(object)*IOUpredtruth (6)
where Pr(Classi|object) is an object class probability;
then applying the activation function by logistic regression:
a predicted target output range is made between 0 and 1, the antenna image is processed by the activation function after feature extraction, and when the output value is greater than 0.5, the object is determined as an antenna;
then performing semantic image segmentation on the antenna image by using a deep convolutional network, and classifying the pixel points of the target image and the background:
after the target image is input, it first goes through feature extraction by a dilated convolutional network; and after a feature image is input, dilated convolution is calculated:
y[i]=Σkx[i+r*k]*w[k] (8)
for a two-dimensional signal, an output corresponding to each position i is y, w is a filter, and the detour rate r is a step size for sampling the input signal;
after the input image is processed by the convolutional network for output, pixel points of the output target image are classified by a fully connected conditional random field, and the classification is mainly performed for the target image and the background boundary.
Further, the calculating an antenna downtilt angle includes:
obtaining the width x and the height y of the antenna box according to the border of the target image, and calculating a downtilt angle of the base station antenna according to a geometric relation, the downtilt angle of the base station antenna being an angle θ between the base station antenna and a vertical plane:
The present disclosure has the following beneficial effects: the present disclosure adopts an antenna downtilt angle measuring method based on a multi-scale deep semantic segmentation network. The method for measuring a downtilt angle of a mobile base station antenna by calling a target detection algorithm and a semantic segmentation algorithm and using an unmanned aerial vehicle as a carrier is highly applicable, cost-effective, and safe.
The present disclosure is further described below with reference to the accompanying drawings and examples.
Referring to
collecting image data: base station antenna data is collected by using an unmanned aerial vehicle and antenna images collected are taken as a data set;
predicting a target bounding box: a target antenna in the data set is positioned, and a bounding box is predicted by logistic regression;
performing target recognition and semantic segmentation: target features of the target antenna in the data set are extracted, the target features are learned and processed by an activation function, a target image is output for semantic image segmentation, and pixel points of the target image and the background are classified; and
calculating an antenna downtilt angle: the width and height of an antenna box are obtained according to a border of the target image to calculate the antenna downtilt angle.
In the embodiment, the method for measuring a downtilt angle of a mobile base station antenna by calling a target detection algorithm and a semantic segmentation algorithm and using an unmanned aerial vehicle as a carrier is highly applicable, cost-effective, and safe.
Further, the step of collecting image data includes:
locating the unmanned aerial vehicle on the top of a pole of a base station antenna, and recording the longitude and latitude (L0, W0) of the pole in the vertical direction; causing the unmanned aerial vehicle to fly around a point of the base station antenna, setting a flight radius of the unmanned aerial vehicle, and the unmanned aerial vehicle moving around the pole along the radius on the same horizontal plane to acquire antenna images with different attitudes and angles of a mobile base station antenna as a data set.
Further, the step of predicting a target bounding box includes:
positioning a target antenna in the antenna image, predicting a bounding box by logistic regression, first dividing the entire antenna image into N*N grids, predicting the entire antenna image after the antenna image is input, scanning each grid at a time, and starting to predict the target antenna when the center of the grid where the target antenna is located is positioned, wherein 4 coordinate values predicted for each bounding box are tx, ty, tw, and th, respectively, an upper-left offset of each target cell is (cx, cy), box heights of prior bounding boxes are px, py respectively, box prediction is as shown in
bx=σ(tx)+cx (1)
by=σ(ty)+cy (2)
bw=pwet
bh=phet
where σ(·) denotes the activation function, which can be expressed as:
σ(x)+1/1e−x
where pw, ph denote the width and height of the prior bounding boxes respectively, e denotes the natural constant, which is about equal to 2.71828; where bx, bv, bw, bh can be calculated according the above formulas, wherein band bh denote the width and the height of the bounding boxes respectively,
where the input antenna image is divided into N*N grids, each grid includes 5 predictors (x, y, w, h, confidence) and a c class, and the output of the network is of a size of S*S*(5*B+C); B is the number of the bounding boxes in each grid, C means the class is only antenna in the present disclosure, and thus is 1, and confidence represents that the predicted grid includes two pieces of information, i.e., confidence of the target antenna and prediction accuracy of the bounding box:
confidence=Pr(object)*IOUprdtruth (5)
where IOUprdtruth denotes Intersection over Union between the bounding boxes and the prior bounding boxes, and where a threshold is set to 0.5 when Pr(Object)=1; the target antenna falls in the center of the grid, that is, the bounding box currently predicted coincides with an actual background box object better than before; if the predicted bounding box is not the best currently, the bounding box is not predicted when the threshold is smaller than 0.5, and it is determined that the target antenna does not fall into the grid.
In the accuracy of a target, multi-scale prediction is used. There is no need to fix the size of an input image, so different step sizes can be used to detect feature maps of different sizes. Three different detection layers are used to detect the antenna image for the target antenna, and different detection layers are realized by controlling the step size. The first detection layer is down-sampled with a step size of 32 to reduce the feature dimension. In order to connect with the previous identical feature graph, the layer is up-sampled, and a high resolution can be obtained at this point. The second detection layer with a step size of 16 is used, and the remaining feature processing is consistent with that of the first layer. The step size is set to 8 in the third layer, feature prediction is performed thereon, and finally, the detection accuracy of the target antenna is greater.
Further, the step of performing target recognition and semantic segmentation includes:
performing target recognition by using a network convolutional layer for feature extraction: antenna image pixel 416*416 is input, the channel number is 3, there are 32 layers of convolution kernels, each kernel has a size of 3*3, 32 layers of convolution kernels are used to learn 32 feature maps, and for color differences of the target antenna, features of the target antenna are learned by using different convolution kernels; convolutional layer up-sampling is performed during feature extraction, and a prediction formula for object classes is as follows:
Pr(Classi|object)*Pr(object)*Pr(object)*IOUpredtruth=Pr(object)*IOUpredtruth (6)
wherein Pr(Classi|object) is an object class probability;
then applying the activation function by logistic regression:
a predicted target output range is made between 0 and 1, the antenna image is processed by the activation function after feature extraction, and when the output value is greater than 0.5, the object is determined as an antenna;
in a network layer structure, there are 53 convolutional layers and 22 residual layers among layers 0-74; layers 75-105 are feature interaction layers of a neural convolutional network, which can be divided into three scales; local feature interaction is realized by means of convolution kernels, and its network structure is as shown in
In the production of the data set, only the antenna is detected, so the class is 1. Therefore, in the training, the output of the last convolutional layer is 3*(1+4+1)=18.
Semantic image segmentation is performed on the antenna image by using a deep convolutional network, and the pixel points of the target image and the background are classified.
After the target image is input, it first goes through feature extraction by a dilated convolutional network. Since the measured boundary precision is not high enough, the pixel of the target image cannot be well separated from the background pixel, and the pixel classification of the image boundary can be improved by combining a fully connected conditional random field, so that the segmentation effect can be better.
It is first feature-extracted by using a dilated convolutional network. The feature extraction of the network convolutional layer can be divided into two cases: a low-resolution input image is feature-extracted by a standard convolutional layer, as shown in
In a network structure of a serial module and a spatial pyramid pooling layer module, the convolution with holes can effectively increase a receptive field of a filter and integrate multi-scale information. After a feature image is input, dilated convolution is calculated:
y[i]=Σkx[i+r*k]*w[k] (8)
For a two-dimensional signal, an output corresponding to each position i is y, w is a filter, and the detour rate r is a step size for sampling the input signal. The receptive field of the filter can be improved, and the convolution with holes can enlarge the convolution kernel. A residual module of multi-scale feature learning is used in the feature network extraction, while the bottleneck block is used in the present disclosure. In the bottleneck block, each convolution is processed by normalization and processed by an activation function. Thus, contextual information of the context is enriched, and the bottleneck block is as shown in
After the input image is processed by the convolutional network for output, pixel points of the output target image are classified by a fully connected conditional random field, and the classification is mainly performed for the target image and the background boundary.
A view of a random field is as shown in
where y is the reference value of xi, E(y|I) is an energy function.
An image function output through a dilated convolutional network is a unary potential function: A binary potential function is
The function the relationship between pixels, and will assign the same symbols to the same prime points. The unary potential function extracts feature vectors of a node in different feature maps, and the binary function connects the nodes extracted by the unitary potential function to learn its edges. All the nodes are connected to form a conditional random field of a fully connected layer, and an image finally output by the function is more accurate.
Further, the step of calculating an antenna downtilt angle includes:
obtaining the width x and the height y of the antenna box according to the border of the target image, and calculating a downtilt angle of the base station antenna according to a geometric relation, the downtilt angle of the base station antenna being an angle θ between the base station antenna and a vertical plane:
The above are merely preferred embodiments of the present disclosure. The present disclosure is not limited to the above implementations. As long as the implementations can achieve the technical effect of the present disclosure with the same means, they are all encompassed in the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201811338415.4 | Nov 2018 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/076718 | 3/1/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/093630 | 5/14/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9596617 | Priest | Mar 2017 | B2 |
9918235 | Brennan et al. | Mar 2018 | B2 |
10402689 | Bogdanovych | Sep 2019 | B1 |
10565787 | Jordan | Feb 2020 | B1 |
10872228 | Zhou | Dec 2020 | B1 |
11257198 | Holub | Feb 2022 | B1 |
20110116708 | Zhou | May 2011 | A1 |
20110150317 | Kim | Jun 2011 | A1 |
20140205205 | Neubauer | Jul 2014 | A1 |
20150278632 | Rodriguez-Serrano | Oct 2015 | A1 |
20160271796 | Babu | Sep 2016 | A1 |
20170077586 | Li | Mar 2017 | A1 |
20180089505 | El-Khamy | Mar 2018 | A1 |
20180137642 | Malisiewicz | May 2018 | A1 |
20180218351 | Chaubard | Aug 2018 | A1 |
20180260415 | Gordo Soldevila | Sep 2018 | A1 |
20190015059 | Itu | Jan 2019 | A1 |
20190043003 | Fisher | Feb 2019 | A1 |
20190130189 | Zhou | May 2019 | A1 |
20190213438 | Jones | Jul 2019 | A1 |
20190332118 | Wang | Oct 2019 | A1 |
20200090519 | Ding | Mar 2020 | A1 |
20200218961 | Kanazawa | Jul 2020 | A1 |
20220004770 | Lei | Jan 2022 | A1 |
Number | Date | Country |
---|---|---|
103256920 | Aug 2013 | CN |
103630107 | Mar 2014 | CN |
104504381 | Apr 2015 | CN |
106683091 | May 2017 | CN |
107664491 | Feb 2018 | CN |
107830846 | Mar 2018 | CN |
Number | Date | Country | |
---|---|---|---|
20210215481 A1 | Jul 2021 | US |