SYSTEM AND METHOD FOR DETECTING OBJECT IN UNDERGROUND SPACE

Information

  • Patent Application
  • 20240177444
  • Publication Number
    20240177444
  • Date Filed
    August 23, 2023
    a year ago
  • Date Published
    May 30, 2024
    8 months ago
Abstract
Provided is a system for detecting an object in an underground space. The system for detecting the object in the underground space includes a movable body moving in the underground space, a wide-angle camera mounted on the movable body to photograph an underground facility in the underground space, an object detection terminal configured to receive an image of the underground facility photographed by the wide-angle camera and correct a corresponding distorted image so as to detect the underground facility corresponding to an object set within the corrected image; and a communication network configured to enable network communication between the movable body and the object detection terminal. Thus, the system may have an effect of robustly detecting the object in an image with a large change in size of an object formed on an image according to the distortion and distance such as the wide-angle or omnidirectional camera.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority under 35 U.S.C. 119 and 35 U.S.C. 365 to Korean Patent Application No. 10-2022-0163299 (filed on 29 Nov. 2022), which is hereby incorporated by reference in its entirety.


BACKGROUND

The present disclosure relates to a system and method for detecting an object in an underground space. The present disclosure relates to a system and method for detecting an object in an underground space, in which a convolutional filter modified to match camera distortion is utilized. The present disclosure relates to a system and method for detecting an object in an underground space, in which a convolutional filter modified to match camera distortion is utilized to increase in detection rate of the object by utilizing a model of deep learning technology for detecting the object that is a region of interest (ROI) without correction of an object having large distortion or size change in an underground facility image photographed using a movable body so as to diagnosis of the underground facility.


In a method for detecting an object according to the related art, an object is detected based on an algorithm that is applicable to a situation at which a degree of distortion or a size change of the object in an image is small. As a related technology, there is Korea Patent Registration No. 10-2303399 (Sep. 13, 2021). Therefore, there is a limitation in that it is difficult to be applied to a situation in which the distortion of the object is large, or the degree of change is large. An existing algorithm for detecting the object may be applied to an image of a wide-angle camera having a wide region of interest (ROI). In this case, there is a limitation in that detection performance is deteriorated due to the distortion or change in object size according to a distance. In an image of a general camera having a narrow region of interest (ROI), the image change is small. For this reason, there is a limitation in that only a small range of an object may be detected with one operation using an object detection model.


Particularly, the algorithm for detecting the object according to the related art uses a fixed convolutional filter even for the same object when the size or shape is different. For this reason, there is a limitation in that the object detection rate decreases.


SUMMARY

Embodiments provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion capable of robustly detecting an object is utilized for an image having a large change in size of an imaged object according to distortion and distance, such as a wide-angle or omnidirectional camera.


Embodiments also provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion is utilized to be modified in response to all camera-specific parameters, to detect an image even if a size or shape of the same object is different, and to detect an object on an image having a wide region with excellent performance by using a wide-angle camera.


Embodiments also provide a system and method for detecting an underground facility, in which a convolutional filter modified to match camera distortion is utilized to detect and diagnose facilities or establishments disposed at both surfaces because the facilities on a wide region are detected using only one wide-angle device.


In one embodiment, an object detection terminal of a system for detecting an object in an underground space, which utilizes a convolutional filter modified to match camera distortion includes: a communication unit configured to communicate with the moving object and receive an image of an underground facility, which is acquired by being captured by the camera; a convolution filter generation unit configured to generate the convolution filter that matches a distortion shape of the camera; a main control unit configured to correct the distortion by applying the convolutional filter generated in the convolution filter generation unit to the underground facility; a feature extraction unit configured to generate a feature map through a convolution operation so as to infer a region from the image of which the distortion is corrected by the main control unit; and an object classification module configured to receive the feature map as an input so as to classify the objects contained in the inferred region.


In the main control unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion, when the convolutional filter is applied to the image of the underground facility, the main control unit may use the following Equation 2:








y

(

p
c

)

=





p
c


N




w

(

p
n

)

·

x

(


p
c

+

p
n


)




,




where x is an input of an i-th layer, y is an output of the i-th layer, y (pc) is an output value of the convolution filter comprising pc at a center of the filter, pc is a position on a feature vector at which a filter center operation occurs, w (pn) is a weight at a pn position of the filter, x (pc+pn) is an input value at the pn position based on the pc position of the input feature vector, N is the number of inputs for the convolution filter to be used for operation, and p is an n-th coordinate used by the convolution filter for an operation.


In the feature map generated through the feature extraction unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion, a plurality of anchor boxes having different sizes may be assigned to the object so as to detect the object.


The main control unit of the system for detecting the object in the underground space, which utilizes the convolutional filter modified to match the camera distortion may correct the distortion by using the convolutional filter of the following Equation 6:







y

(

p
c

)

=





p
c


N




w

(

p
n

)

·

x

(


p
c

+

p
n

+

Δ


p
n



)







In another embodiment, a method for detecting an object in an underground space, which utilizes a convolutional filter modified to match camera distortion includes: (a) photographing an underground facility by a wide-angle camera mounted on a movable body; (b) acquiring an image of the underground facility, which is photographed by the wide-angle camera to transmit the image to an object detection terminal; (c) allowing the object detection terminal to receive the image of the underground facility and a parameter of the camera; (d) allowing the object detection terminal to generate the convolutional filter that matches the distortion of the camera using the camera parameter; (e) allowing the object detection terminal to constitute an object detection model and learns an corresponding object detection model; (f) allowing the object detection terminal to classify which object is included in an inferred region using the feature map and classify which object is included in a corresponding region using a convolutional layer; and (g) allowing the object detection terminal to visualize the object into an object bounding box.


The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a view illustrating a system for detecting an object in an underground space using a convolutional filter modified according to camera distortion according to the present disclosure.



FIG. 2 is a view illustrating an example of a distorted image photographed by a wide-angle camera of the system for detecting the object in the underground space using the convolutional filter modified to match camera distortion according to the present disclosure.



FIG. 3 is a block diagram illustrating an object detection terminal of the system for detecting the object in the underground space using the convolutional filter modified to match the camera distortion according to the present disclosure.



FIG. 4 is a view illustrating an example of applying the convolutional filter of the system for detecting the object in the underground space using the convolutional filter modified to match the camera distortion according to the present disclosure.



FIG. 5 is a view for explaining an object detection model schematic and an anchor box estimation algorithm of the system for detecting the object in the underground object using the convolutional filter modified to match the camera distortion according to the present disclosure.



FIG. 6 is a flowchart illustrating a method for detecting an object in an underground space using a convolutional filter modified according to camera distortion according to the present disclosure.





DETAILED DESCRIPTION OF THE EMBODIMENTS

Terms or words used in this specification and claims should not be construed as being limited to ordinary or dictionary meanings and should be interpreted as meaning and concept consistent with the technical spirit of the present invention by the inventor based on that he/she is able to define terms to describe his/her invention in the best way to be seen by others.


Since the embodiments described in this specification and the configurations shown in the drawings are only one most preferred embodiment of the present invention and do not represent all of the technical ideas of the present invention, it should be understood that there may be various equivalents and modifications that can be substituted for them at the time of this application.


Hereinafter, a system and method for detecting an object in an underground space using a convolutional filter modified to match camera distortion according to the present disclosure will be described in detail with reference to the accompanying drawings.


First, as illustrated in FIG. 1, a system for detecting a facility using a convolutional filter modified to match camera distortion according to the present disclosure may include a movable body 100, a wide-angle camera 200, a communication network 300, and an object detection terminal 400.


The movable body 100 may include a movable drone, robot, or RC car for diagnosing the underground facility. The wide-angle camera 200 may be mounted on the above-described various movable bodies 100 to photograph the underground facility.


The wide-angle camera 200 may use at least one of a wide-angle lens or an ultra-wide-angle lens, which has a wide field of view (FOV). The wide-angle camera may see a range wider than that of a camera using a general lens. However, as a result, as illustrated in FIG. 2, image distortion may be severe.


The movable body 100 may transmit an image photographed by the wide-angle camera 200 to the object detection terminal 400 through the communication network 300.


The object detection terminal 400 may receive an image photographed by the wide-angle camera 300. The terminal 400 may detect an object that is a region of interest (ROI) set in the image. The object detection terminal may be a laptop computer, a desktop PC, or a tablet PC. The object detection terminal 400 may receive the image photographed by the wide-angle camera 300. The object detection terminal 400 may correct the image distortion into a normalized image. The object detection terminal 400 may detect an object corresponding to the region of interest (ROI) in the corrected image.


As illustrated in FIG. 3, the object detection terminal 400 may include a communication unit 410, a main control unit 420, a convolution filter generation unit 430, and a feature extraction unit 440. The communication unit 410 may communicate with the movable body 100. As a result, the terminal 400 may receive an image of an underground facility, which is acquired by being photographed by the wide-angle camera 200. The convolution filter generation unit 430 may generate a convolution filter that matches a distortion shape of the wide-angle camera 300. Here, the meaning of ‘matching’ may mean corresponding to the distortion of the wide-angle camera. For example, the meaning of ‘matching’ may correspond to a camera parameter.


The convolution filter generation unit 430 may consider the camera parameter to generate a filter (here, the filter may mean the convolution filter) . The above parameter may be transmitted from the movable body. For example, the camera parameter may include a focal length, an angle of view, and a relative aperture of a lens. The angle of view may be related to the lens. The relative aperture may be related to the lens. At least one of the angle of view and the relative aperture may be related to a factor other than the lens. At least one of the angle of view and the relative aperture may be related to other components of the camera other than the lens. The convolution filter generation unit 430 may use a convolutional neural network (e.g., ResNet-101) . For example, the feature extraction network may be defined as [Equation 1]. [Equation 1] may be defined as a set of functions. [Equation 1] may be a set of convolution filters for extracting features. The convolution filter generation unit 430 may generate the convolution filter.





fin<f1, f2, . . . f1>






f
1
=f
i(x) i=(1, 2, . . . 1)   [Equation 1]


In [Equation 1], i may be the number of layers


constituting the network, Fi may be a feature vector passing through the i-th layer, x may be a feature vector passing through the i-th layer, and fi(.) may be an operation of the i-th layer of a CNN network constituted by a sliding window algorithm.


The main control unit 420 may apply the convolutional filter generated by the convolutional filter generation unit 430 to the distorted image. Here, the distorted image may be a distorted image of the underground facility.


The convolution filter may be provided using [Equation 2] below.










y

(

p
c

)

=





p
c


N




w

(

p
n

)

·

x

(


p
c

+

p
n


)







[

Equation


2

]







The i-th layer may be assumed. In this case, in [Equation 2], x may be an input of the i-th layer, y may be an output of the i-th layer, y(pc) may be an output value of the convolution filter including pc at a center of the filter, pc may be a position on a feature vector at which a filter center operation occurs, w(pn) may be a weight at a pn position of the filter, x(pc+pn) may be an input value at the pn position based on the pc position of the input feature vector, and N may be the number of inputs for the convolution filter to be used for operation.


In [Equation 2], p may be an n-th coordinate used by the convolution filter for operation (e.g., in the case of a 3×3 convolution filter, one of (−1, −1), (0, −1), (1, −1), (−1, 0), (1, 0), (−1, 1), (0, 1), (1, 1)) .


The main control unit 420 may correct the distortion by applying the convolutional filter to the distorted image photographed by the wide-angle camera 300.


When the main control unit 420 corrects the distortion of the distorted image, the feature extraction unit 440 may calculate feature maps through the convolution operation to perform region inference. The feature extraction unit may use the corrected image.


Thereafter, the main control unit 420 may infer regions with high probability, in which the object will exist, from the feature map calculated by the feature extraction unit 440 by using a region proposal network (RPN) algorithm. FIG. 5A illustrates the RPN. Here, to detect objects having various sizes, an object region may be inferred by utilizing anchor boxes having various sizes. FIG. 5B illustrates the anchor boxes having various sizes.


Each of the anchor boxes may have a predetermined shape of an object to be detected. For example, if a vehicle is an object to be detected, an anchor box for the vehicle may be predetermined. People, pipes, lanes, and cabinets may the same. The anchor box may refer to a method of assigning the most similar one among a plurality of different boxes when an object is detected.


The anchor box may mean a box in which a predetermined object is likely. The anchor box may play several roles in object detection. The anchor box may be used to detect all overlapping objects. For example, the vehicle in a lateral direction and the person in a longitudinal direction may overlap each other. Here, both the objects may be detected using several, tens, thousands, or tens of thousands of anchor boxes having various aspect ratios and scales so as to detect both the objects.


The main control unit 420 may move the sliding window having the same size. When the sliding window is applied to a center of the anchor box, it may be determined whether the object is included in the anchor box. For example, in the acquired anchor box, a separate convolutional layer may be used to proceed with binary classification for determining whether the anchor box is an object. When the acquired anchor box is an object, accurate bounding box coordinates may be predicted by applying bounding box regression.


Thereafter, a region in which the object is disposed may be extracted through a region of interest pooling. Then, the region in which the object is disposed is converted into a feature map having a fixed size.


The object classification module 450 may perform at least one of an input of the feature map, classification what kind of object contained in the inferred region, or classification which objects are contained in the region by using the convolutional layer. The object classification module 450 may perform all of the above processes.


The present disclosure may provide a system without the need for preprocessing. In the system and the method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure, the object may be detected using the modified convolutional filter without the preprocessing. Here, the preprocessing may refer to a process of removing the distortion of the distorted image. The preprocessing may refer to processing required to correspond to the camera distortion. The preprocessing may be performed before passing feature extraction unit 440. In this embodiment, since the distorted image is used as it is, the preprocessing may not be necessary.


Hereinafter, the system without the need for the preprocessing will be described in detail.


When image optical distortion occurs, a position (x, y) of a coordinate pair of a distorted point may be expressed by [Equation 3] below.






x
text missing or illegible when filed
=x(1+k1r2+k2r4+k3r6)






y
text missing or illegible when filed
=y(1+k1r2+k2r4+k3r6)   [Equation 3]


As a result, when the image optical distortion occurs, it is modified into a distorted point coordinate pair as shown in Equation 3 above. Thus, performance of detection of the underground facility may be deteriorated.


The x, y coordinates distorted by [Equation 3] may be expressed as in [Equation 4] below.





Δx=−x(k1r2+k2r4+k3r6)





Δy=−y(k1r2+k2r4+k3r6)   [Equation 4]


In [Equation 3] and [Equation 4], each k may be a lens-related parameter. For example, each k may be a camera parameter. For example, each k may be a focal length, an angle of view, and a relative aperture value. To solve the distortion coordinates (Δx, Δy) from the viewpoint of the convolutional filter, the distortion may be corrected through a position Δpn of the filter. The Δpn may be a modified position of the position Δpn used for calculation and may be calculated by [Equation 5] below.





Δpn=Δpn+c−Δpc   [Equation 5]


Here, each of the Δpn+c and the Δpc is a coordinate (Δx, Δy) for each position.


When substituting [Equation 5] into [Equation 2], a convolutional filter operation expression such as [Equation 6] below may be calculated.










y

(

p
c

)

=





p
c


N




w

(

p
n

)

·

x

(


p
c

+

p
n

+

Δ


p
n



)







[

Equation


6

]







The main control unit 420 of the system for detecting the underground facility using the convolution filter modified to match the camera distortion according to the present disclosure may use the convolution filter of [Equation 6] instead of the convolution filter of [Equation 2]. As a result, the processing may not be necessary. In other words, instead of the convolutional filter of [Equation 2] in f1 (a function of a first layer of the convolutional layer) of [Equation 1] that receives the distorted image as an input, the modified convolutional filter of [Equation 6] that is the convolutional filter modified to match the camera distortion may be used. As a result, distortion of an image photographed by the wide-angle camera 300 may be resolved.


A detection method by the system for detecting the object in the underground space using the convolutional filter modified to match the camera distortion according to the present disclosure, which has the configuration as described above will be described.


The wide-angle camera 200 mounted on the movable body 100 moving in the underground space may perform a process (S100) of photographing an underground facility.


The movable body 100 may perform a process of acquiring an image of the underground facility, which is photographed by the wide-angle camera 200 to transmit the image to the object detection terminal 400 (S200).


The object detection terminal 400 may perform a process of receiving the image of the underground facility, which is photographed by the wide-angle camera 200 and parameters of the camera used at this time from the movable body 100 (S300). The image and the parameters may be stored in advance instead of the reception.


The convolution filter generation unit 430 of the object detection terminal 400 may perform a process of generating a convolution filter that matches a distortion shape of the camera with the camera parameters (S400).


The object detection terminal 400 may constitute an object detection model and perform a process of learning the object detection model (S500).


The object detection terminal 400 may perform a process of receiving a feature map as an input, classifying an object contained in the inferred region as an object, and classifying which object is contained in the corresponding region using a convolutional layer (S600).


The object detection terminal 400 may perform a process of visualizing an object into an object bounding box (S700).


According to the embodiment, the underground facility may be more accurately detected.


The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of robustly detecting the object in the image with the large change in size of the object formed on the image according to the distortion and the distance such as the wide-angle or omnidirectional camera.


The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of detecting and diagnosing the facilities or establishments disposed on both surfaces at once because the facilities on the wide region are detected using only one wide-angle device.


The system and method for detecting the underground facility using the convolutional filter modified to match the camera distortion according to the present disclosure may have the effect of being deformed in response to all the camera-specific parameters, being deformed even if the size or shape of the same object is different, and detecting the object on the image having the wide region with the excellent performance by using the wide-angle camera.


Although embodiments have been described with reference


to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More particularly, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, alternative uses will also be apparent to those skilled in the art.

Claims
  • 1. A system for detecting an object in an underground space, the system comprising: a movable body moving in the underground space;a camera mounted on the movable body to photograph a first image containing at least one object placed in the underground space;an object detection terminal configured to receive the first image and correct the first image so as to detect a first object that is a preset object and is comprised in the at least one object and a second object that is different from the first object, is a preset object, and is comprised in the at least one object within a corrected second image; anda communication network configured to enable network communication between the movable body and the object detection terminal.
  • 2. The system according to claim 1, wherein each of the first object and the second object comprises at least one of person, a vehicle, or an underground facility.
  • 3. The system according to claim 1, wherein the first image comprises a distorted image.
  • 4. The system according to claim 1, wherein the camera comprises a wide-angle camera.
  • 5. The system according to claim 1, wherein the wide-angle camera uses at least one of a wide-angle lens or an ultra-wide-angle lens.
  • 6. The system according to claim 1, wherein the object detection terminal is configured to detect a plurality of objects comprising the first object and the second object.
  • 7. The system according to claim 1, wherein the object detection terminal comprises: a communication unit configured to communicate with the moving object and receive the first image acquired by being captured by the camera;a convolution filter generation unit configured to generate a convolution filter that matches a distortion shape of the camera;a main control unit configured to correct the distortion by applying the convolutional filter to the first image;a feature extraction unit configured to calculate a feature map through a convolution operation so as to infer a region from the second image corrected by the main control unit;an object classification module configured to receive the feature map as an input so as to classify the first object and the second object contained in the inferred region.
  • 8. The system according to claim 7, wherein, when the convolutional filter is applied to the second image, the main control unit uses the following Equation:
  • 9. The system according to claim 7, wherein the main control unit is configured to infer regions with high probability, in which at least one of the first object and the second object exists, from the feature map calculated by the feature extraction unit.
  • 10. The system according to claim 9, wherein the main control unit is configured to infer an object region by utilizing a plurality of different anchor boxes so as to detect the first object and the second object.
  • 11. The system according to claim 10, wherein the main control unit is configured to use an RPN algorithm.
  • 12. The system according to claim 7, wherein the convolutional filter generation unit is configured to generate a convolutional filter configured to correct distortion due to a parameter of the camera.
  • 13. The system according to claim 12, wherein the parameter of the camera comprises at least one of a focal length of a lens used in the camera, an angle of view of the camera, and a relative aperture of the camera.
  • 14. The system according to claim 7, wherein the convolution filter generation unit is configured to generate a convolution filter configured to correct distortion due to a parameter of the camera, which comprises a focal length of a lens, an angle of view of the camera, and a relative aperture of the camera.
  • 15. A system for detecting an object in an underground space, the system comprising: a movable body moving in the underground space;a camera mounted on the movable body to photograph a first image containing at least one object placed in the underground space;an object detection terminal configured to receive the first image and correct the first image so as to detect an object comprised in the first image within a corrected second image; anda communication network configured to enable network communication between the movable body and the object detection terminal,wherein the object detection terminal comprises a main control unit, which is provided in a convolution filter and is configured to correct the second image by using the convolution filter,wherein the main control unit uses the following Equation:
  • 16. The system according to claim 15, wherein the Δpn is calculated by the following Equation: Δpn=Δpn+c−Δpc where, each of the Δpn+c and the Δpc is a coordinate (Δx, Δy) for each position.
  • 17. A method for detecting an object in an underground space, the method comprising: (a) photographing an underground facility by a camera mounted on a movable body;(b) acquiring a first image photographed by the camera to transmit the first image to an object detection terminal;(c) allowing the object detection terminal to receive the first image;(d) allowing the object detection terminal to generate a convolutional filter that matches distortion of the camera using a camera parameter and generate a second image from the first image using the convolutional filter;(e) allowing the object detection terminal to calculate a feature map through a convolution operation so as to infer a region from the second image;(f) allowing the object detection terminal to classify which object is comprised in an inferred region using the feature map and classify which object is comprised in a corresponding region using a convolutional layer; and(g) allowing the object detection terminal to visualize the object into an object bounding box.
  • 18. The method according to claim 17, wherein the parameter of the camera is acquired by the object detection terminal before generating the convolutional filter.
  • 19. The method according to claim 17, wherein the convolutional filter is provided as the following Equation:
  • 20. The method according to claim 17, wherein the convolutional filter is provided as the following Equation:
Priority Claims (1)
Number Date Country Kind
10-2022-0163299 Nov 2022 KR national