The present application claims priority under 35 U.S.C. 119 and 35 U.S.C. 365 to Korean Patent Application No. 10-2022-0163299 (filed on 29 Nov. 2022), which is hereby incorporated by reference in its entirety.
The present disclosure relates to a system and method for detecting an object in an underground space. The present disclosure relates to a system and method for detecting an object in an underground space, in which a convolutional filter modified to match camera distortion is utilized. The present disclosure relates to a system and method for detecting an object in an underground space, in which a convolutional filter modified to match camera distortion is utilized to increase in detection rate of the object by utilizing a model of deep learning technology for detecting the object that is a region of interest (ROI) without correction of an object having large distortion or size change in an underground facility image photographed using a movable body so as to diagnosis of the underground facility.
In a method for detecting an object according to the related art, an object is detected based on an algorithm that is applicable to a situation at which a degree of distortion or a size change of the object in an image is small. As a related technology, there is Korea Patent Registration No. 10-2303399 (Sep. 13, 2021). Therefore, there is a limitation in that it is difficult to be applied to a situation in which the distortion of the object is large, or the degree of change is large. An existing algorithm for detecting the object may be applied to an image of a wide-angle camera having a wide region of interest (ROI). In this case, there is a limitation in that detection performance is deteriorated due to the distortion or change in object size according to a distance. In an image of a general camera having a narrow region of interest (ROI), the image change is small. For this reason, there is a limitation in that only a small range of an object may be detected with one operation using an object detection model.
Particularly, the algorithm for detecting the object according to the related art uses a fixed convolutional filter even for the same object when the size or shape is different. For this reason, there is a limitation in that the object detection rate decreases.
Embodiments provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion capable of robustly detecting an object is utilized for an image having a large change in size of an imaged object according to distortion and distance, such as a wide-angle or omnidirectional camera.
Embodiments also provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion is utilized to be modified in response to all camera-specific parameters, to detect an image even if a size or shape of the same object is different, and to detect an object on an image having a wide region with excellent performance by using a wide-angle camera.
Embodiments also provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion is utilized to detect and diagnose facilities or establishments disposed at both surfaces because the facilities on a wide region are detected using only one wide-angle device.
In one embodiment, an object detection terminal of a system for detecting an object in an underground space, which utilizes a convolutional filter modified to match camera distortion includes: a communication unit configured to communicate with the moving object and receive an image of an underground facility, which is acquired by being captured by the camera; a convolution filter generation unit configured to generate the convolution filter that matches a distortion shape of the camera; a main control unit configured to correct the distortion by applying the convolutional filter generated in the convolution filter generation unit to the underground facility; a feature extraction unit configured to generate a feature map through a convolution operation so as to infer a region from the image of which the distortion is corrected by the main control unit; and an object classification module configured to receive the feature map as an input so as to classify the objects contained in the inferred region.
In the main control unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion, when the convolutional filter is applied to the image of the underground facility, the main control unit may use the following Equation 2:
where x is an input of an i-th layer, y is an output of the i-th layer, y (pc) is an output value of the convolution filter comprising pc at a center of the filter, pc is a position on a feature vector at which a filter center operation occurs, w (pn) is a weight at a pn position of the filter, x (pc+pn) is an input value at the pn position based on the pc position of the input feature vector, N is the number of inputs for the convolution filter to be used for operation, and p is an n-th coordinate used by the convolution filter for an operation.
In the feature map generated through the feature extraction unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion, a plurality of anchor boxes having different sizes may be assigned to the object so as to detect the object.
The main control unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion may correct the distortion by using the convolutional filter of the following Equation 6:
In another embodiment, a method for detecting an object in an underground space, which utilizes a convolutional filter modified to match camera distortion includes: (a) photographing an underground facility by a wide-angle camera mounted on a movable body; (b) acquiring an image of the underground facility, which is photographed by the wide-angle camera to transmit the image to an object detection terminal; (c) allowing the object detection terminal to receive the image of the underground facility and a parameter of the camera; (d) allowing the object detection terminal to generate the convolutional filter that matches the distortion of the camera using the camera parameter; (e) allowing the object detection terminal to constitute an object detection model and learns an corresponding object detection model; (f) allowing the object detection terminal to classify which object is included in an inferred region using the feature map and classify which object is included in a corresponding region using a convolutional layer; and (g) allowing the object detection terminal to visualize the object into an object bounding box.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.
Terms or words used in this specification and claims should not be construed as being limited to ordinary or dictionary meanings and should be interpreted as meaning and concept consistent with the technical spirit of the present invention by the inventor based on that he/she is able to define terms to describe his/her invention in the best way to be seen by others.
Since the embodiments described in this specification and the configurations shown in the drawings are only one most preferred embodiment of the present invention and do not represent all of the technical ideas of the present invention, it should be understood that there may be various equivalents and modifications that can be substituted for them at the time of this application.
Hereinafter, a system and method for detecting an object in an underground space using a convolutional filter modified to match camera distortion according to the present disclosure will be described in detail with reference to the accompanying drawings.
First, as illustrated in
The movable body 100 may include a movable drone, robot, or RC car for diagnosing the underground facility. The wide-angle camera 200 may be mounted on the above-described various movable bodies 100 to photograph the underground facility.
The wide-angle camera 200 may use at least one of a wide-angle lens or an ultra-wide-angle lens, which has a wide field of view (FOV). The wide-angle camera may see a range wider than that of a camera using a general lens. However, as a result, as illustrated in
The movable body 100 may transmit an image photographed by the wide-angle camera 200 to the object detection terminal 400 through the communication network 300.
The object detection terminal 400 may receive an image photographed by the wide-angle camera 300. The terminal 400 may detect an object that is a region of interest (ROI) set in the image. The object detection terminal may be a laptop computer, a desktop PC, or a tablet PC. The object detection terminal 400 may receive the image photographed by the wide-angle camera 300. The object detection terminal 400 may correct the image distortion into a normalized image. The object detection terminal 400 may detect an object corresponding to the region of interest (ROI) in the corrected image.
As illustrated in
The convolution filter generation unit 430 may consider the camera parameter to generate a filter (here, the filter may mean the convolution filter) . The above parameter may be transmitted from the movable body. For example, the camera parameter may include a focal length, an angle of view, and a relative aperture of a lens. The angle of view may be related to the lens. The relative aperture may be related to the lens. At least one of the angle of view and the relative aperture may be related to a factor other than the lens. At least one of the angle of view and the relative aperture may be related to other components of the camera other than the lens. The convolution filter generation unit 430 may use a convolutional neural network (e.g., ResNet-101) . For example, the feature extraction network may be defined as [Equation 1]. [Equation 1] may be defined as a set of functions. [Equation 1] may be a set of convolution filters for extracting features. The convolution filter generation unit 430 may generate the convolution filter.
fin<f1, f2, . . . f1>
f
1
=f
i(x) i=(1, 2, . . . 1) [Equation 1]
In [Equation 1], i may be the number of layers
constituting the network, Fi may be a feature vector passing through the i-th layer, x may be a feature vector passing through the i-th layer, and fi(.) may be an operation of the i-th layer of a CNN network constituted by a sliding window algorithm.
The main control unit 420 may apply the convolutional filter generated by the convolutional filter generation unit 430 to the distorted image. Here, the distorted image may be a distorted image of the underground facility.
The convolution filter may be provided using [Equation 2] below.
The i-th layer may be assumed. In this case, in [Equation 2], x may be an input of the i-th layer, y may be an output of the i-th layer, y(pc) may be an output value of the convolution filter including pc at a center of the filter, pc may be a position on a feature vector at which a filter center operation occurs, w(pn) may be a weight at a pn position of the filter, x(pc+pn) may be an input value at the pn position based on the pc position of the input feature vector, and N may be the number of inputs for the convolution filter to be used for operation.
In [Equation 2], p may be an n-th coordinate used by the convolution filter for operation (e.g., in the case of a 3×3 convolution filter, one of (−1, −1), (0, −1), (1, −1), (−1, 0), (1, 0), (−1, 1), (0, 1), (1, 1)) .
The main control unit 420 may correct the distortion by applying the convolutional filter to the distorted image photographed by the wide-angle camera 300.
When the main control unit 420 corrects the distortion of the distorted image, the feature extraction unit 440 may calculate feature maps through the convolution operation to perform region inference. The feature extraction unit may use the corrected image.
Thereafter, the main control unit 420 may infer regions with high probability, in which the object will exist, from the feature map calculated by the feature extraction unit 440 by using a region proposal network (RPN) algorithm.
Each of the anchor boxes may have a predetermined shape of an object to be detected. For example, if a vehicle is an object to be detected, an anchor box for the vehicle may be predetermined. People, pipes, lanes, and cabinets may the same. The anchor box may refer to a method of assigning the most similar one among a plurality of different boxes when an object is detected.
The anchor box may mean a box in which a predetermined object is likely. The anchor box may play several roles in object detection. The anchor box may be used to detect all overlapping objects. For example, the vehicle in a lateral direction and the person in a longitudinal direction may overlap each other. Here, both the objects may be detected using several, tens, thousands, or tens of thousands of anchor boxes having various aspect ratios and scales so as to detect both the objects.
The main control unit 420 may move the sliding window having the same size. When the sliding window is applied to a center of the anchor box, it may be determined whether the object is included in the anchor box. For example, in the acquired anchor box, a separate convolutional layer may be used to proceed with binary classification for determining whether the anchor box is an object. When the acquired anchor box is an object, accurate bounding box coordinates may be predicted by applying bounding box regression.
Thereafter, a region in which the object is disposed may be extracted through a region of interest pooling. Then, the region in which the object is disposed is converted into a feature map having a fixed size.
The object classification module 450 may perform at least one of an input of the feature map, classification what kind of object contained in the inferred region, or classification which objects are contained in the region by using the convolutional layer. The object classification module 450 may perform all of the above processes.
The present disclosure may provide a system without the need for preprocessing. In the system and the method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure, the object may be detected using the modified convolutional filter without the preprocessing. Here, the preprocessing may refer to a process of removing the distortion of the distorted image. The preprocessing may refer to processing required to correspond to the camera distortion. The preprocessing may be performed before passing feature extraction unit 440. In this embodiment, since the distorted image is used as it is, the preprocessing may not be necessary.
Hereinafter, the system without the need for the preprocessing will be described in detail.
When image optical distortion occurs, a position (x, y) of a coordinate pair of a distorted point may be expressed by [Equation 3] below.
x
=x(1+k1r2+k2r4+k3r6)
y
=y(1+k1r2+k2r4+k3r6) [Equation 3]
As a result, when the image optical distortion occurs, it is modified into a distorted point coordinate pair as shown in Equation 3 above. Thus, performance of detection of the underground facility may be deteriorated.
The x, y coordinates distorted by [Equation 3] may be expressed as in [Equation 4] below.
Δx=−x(k1r2+k2r4+k3r6)
Δy=−y(k1r2+k2r4+k3r6) [Equation 4]
In [Equation 3] and [Equation 4], each k may be a lens-related parameter. For example, each k may be a camera parameter. For example, each k may be a focal length, an angle of view, and a relative aperture value. To solve the distortion coordinates (Δx, Δy) from the viewpoint of the convolutional filter, the distortion may be corrected through a position Δpn of the filter. The Δpn may be a modified position of the position Δpn used for calculation and may be calculated by [Equation 5] below.
Δpn=Δpn+c−Δpc [Equation 5]
Here, each of the Δpn+c and the Δpc is a coordinate (Δx, Δy) for each position.
When substituting [Equation 5] into [Equation 2], a convolutional filter operation expression such as [Equation 6] below may be calculated.
The main control unit 420 of the system for detecting the underground facility using the convolution filter modified to match the camera distortion according to the present disclosure may use the convolution filter of [Equation 6] instead of the convolution filter of [Equation 2]. As a result, the processing may not be necessary. In other words, instead of the convolutional filter of [Equation 2] in f1 (a function of a first layer of the convolutional layer) of [Equation 1] that receives the distorted image as an input, the modified convolutional filter of [Equation 6] that is the convolutional filter modified to match the camera distortion may be used. As a result, distortion of an image photographed by the wide-angle camera 300 may be resolved.
A detection method by the system for detecting the object in the underground space using the convolutional filter modified to match the camera distortion according to the present disclosure, which has the configuration as described above will be described.
The wide-angle camera 200 mounted on the movable body 100 moving in the underground space may perform a process (S100) of photographing an underground facility.
The movable body 100 may perform a process of acquiring an image of the underground facility, which is photographed by the wide-angle camera 200 to transmit the image to the object detection terminal 400 (S200).
The object detection terminal 400 may perform a process of receiving the image of the underground facility, which is photographed by the wide-angle camera 200 and parameters of the camera used at this time from the movable body 100 (S300). The image and the parameters may be stored in advance instead of the reception.
The convolution filter generation unit 430 of the object detection terminal 400 may perform a process of generating a convolution filter that matches a distortion shape of the camera with the camera parameters (S400).
The object detection terminal 400 may constitute an object detection model and perform a process of learning the object detection model (S500).
The object detection terminal 400 may perform a process of receiving a feature map as an input, classifying an object contained in the inferred region as an object, and classifying which object is contained in the corresponding region using a convolutional layer (S600).
The object detection terminal 400 may perform a process of visualizing an object into an object bounding box (S700).
According to the embodiment, the underground facility may be more accurately detected.
The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of robustly detecting the object in the image with the large change in size of the object formed on the image according to the distortion and the distance such as the wide-angle or omnidirectional camera.
The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of detecting and diagnosing the facilities or establishments disposed on both surfaces at once because the facilities on the wide region are detected using only one wide-angle device.
The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of being deformed in response to all the camera-specific parameters, being deformed even if the size or shape of the same object is different, and detecting the object on the image having the wide region with the excellent performance by using the wide-angle camera.
Although embodiments have been described with reference
to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More particularly, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, alternative uses will also be apparent to those skilled in the art.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0163299 | Nov 2022 | KR | national |