The present invention relates to a technique for detecting an object on the basis of an image captured by a camera.
A technique of detecting an object by performing image processing on an image captured by a camera is known. For example, the object detection device disclosed in Patent Literature 1 includes a parameter setting means which includes a scan range changing means for changing a scan range in one frame of an input video in accordance with a distance class according to a distance between a camera and an object to be detected, a detection window size changing means for changing a detection window size in accordance with the distance class, and a detection window movement amount changing means for changing a detection window movement amount in accordance with the distance class, a resizing processing unit for inputting an input video taken with the camera and resizing the input video in a scan range set according to the distance class by the scan range changing means, a search setting unit which sets searching of a resizing image processed by the resizing processing unit with the detection window size set according to the distance class by the detection window size changing means and the detection window movement amount set according to the distance class by the detection window movement amount changing means, and a multiple-size raster scan processing unit which scans, at least one stage at each class of the distance class, and searches the resizing image based on the scan range by the detection window size and the detection window movement amount set according to the distance class by the search setting part, and registers an area of the detection window at a position where the object to be detected is present, for detecting the object to be detected.
In order to ensure the safety of the person working around the construction machine, it is conceivable to mount an object detection device described above on the construction machine. In the case of construction machines, it is further required to improve the accuracy and speed of object detection because of human safety concern.
Patent Literature 1: JP 2017-156988 A
An object of the present invention is to provide an object detection device and an object detection method for a construction machine, which can improve accuracy and speed of object detection.
An object detection device for a construction machine according to one aspect of the present invention includes: a correction unit that corrects, by using projective transformation, distortion of an image captured by a camera capturing an image of a detection target object and mounted on a construction machine; and a processing unit that sets, in the image after correction, a plurality of regions provided in accordance with a distance between the camera and the object and each having a different area, and performs image processing of detecting the object with respect to a part of the image corresponding to each of the plurality of regions.
An embodiment of the present invention will be described below in detail with reference to the drawings. In each of the drawings, it is indicated that components denoted by identical reference numerals are the identical components, and the description of the components having already been described is omitted. In this description, a generic term is indicated by a reference numeral without a suffix (e.g., person 10), and an individual component is indicated by a reference numeral with a suffix (e.g., person 10-1).
The construction machine 100 includes a cabin 101 in which an operator operates the construction machine 100. The object detection device 1 is arranged in the cabin 101. The camera 2 is mounted on a predetermined position of the construction machine 100, and a capturing range is set in a range where the detection target person 10 (e.g., a person 10-1) appears. The camera 2 sends a captured image Im to the object detection device 1.
The image input unit 3 inputs the image Im (moving image) captured by the camera 2. The image input unit 3 is implemented by an input interface (input interface circuit).
The control processing unit 4 is a hardware processor that performs various controls and various processing of the object detection device 1, and includes, as functional blocks, a correction unit 41, a processing unit 42, a storage unit 43, and a display control unit 44. The control processing unit 4 is implemented by hardware such as a central processing unit (CPU), a graphics processing unit (GPU), a random access memory (RAM), a read only memory (ROM), and a hard disk drive (HDD), and programs and data for executing the functions of the functional blocks described above.
The correction unit 41 corrects distortion of the image Im having been input to the image input unit 3. The reason that the distortion of the image Im is corrected will be described. Referring to
The correction unit 41 performs projective transformation on the image Im-1 to reduce the horizontal length of the image Im-1 from the lower side toward the upper side of the image Im-1 in the vertical direction, thereby correcting the shape of the image Im-1 into a trapezoid.
Since the distortion of the image Im-1 is reduced or eliminated by the correction, the amount by which the person 10-2 appearing on the left side in the image Im-1 is inclined to the left and the amount by which the person 10-3 appearing on the right side in the image Im-1 is inclined to the right can be reduced or eliminated.
The correction of the distortion of the image Im will be described with reference to the image views.
In the center of the image Im-2, the inclination amount of the image Im-2 is smaller than that of the left and right sides.
Referring to
The image Im after correction (e.g., the image Im-1 shown in
The processing unit 42 executes the image processing described above for each of a plurality of regions R provided in accordance with the distance between the camera 2 and the object. Not a single region but a plurality of regions are provided in accordance with the distance between the camera 2 and the person 10. Description will be given of three regions R (regions R-1, R-2, and R-3) as an example in the embodiment.
The region R-1 is used to detect the person 10 whose distance from the camera 2 is a short distance (e.g., distance of 3 meters or less from the camera 2). The region R-2 is used to detect the person 10 whose distance from the camera 2 is a medium distance (e.g., distance of 3 to 6 meters from the camera 2). The region R-3 is used to detect the person 10 whose distance from the camera 2 is a long distance (e.g., distance of 6 to 10 meters from the camera 2).
The area of the region R is varied according to the distance between the camera 2 and the person 10 (object). The reason is as follows. The longer the distance between the camera 2 and the person 10 is, the smaller the area of a part where the person 10 may appear in the image Im becomes. For example, in the detection of the person 10 walking outdoors, the longer the distance between the camera 2 and the person 10 is, the larger the area of a part where the ground appears and the area of a part where the sky appears become, and the smaller the area of a part where the person 10 may appear becomes.
Therefore, the distance between the camera 2 and the person 10 is classified into a short distance, a medium distance, and a long distance, and set to the area of the region R where the following relation is established. This can reduce the amount of image processing of person detection, and hence it is possible to improve the speed of person detection.
Area of Region R (Region R-1) for Short Distance>Area of Region R (Region R-2) for Medium Distance>Area of Region R (Region R-3) for Long Distance
The regions R-1, R-2, and R-3 have a horizontal length smaller than the horizontal length of the image Im before correction. The range in which the person 10 whose distance from the camera 2 is a short distance appears in the image Im after correction is almost the entire image Im after correction. The region R-1 for a short distance is set to almost the entire image Im after correction. Since the construction machine 100 appears in the lower part of the image Im after correction, the region R-1 for a short distance is not set in this part.
The region R-2 for a medium distance is set in a part from slightly below the center of the image Im after correction to the upper end of the image Im. This is because the person 10 whose distance from the camera 2 is a medium distance appears in this part. The region R-3 for a long distance is set to a part above the center of the image Im after correction. This is because the person 10 whose distance from the camera 2 is a long distance appears in this part.
Due to the above, the processing unit 42 can reduce the amount of image processing compared with the case of performing the image processing of person detection on the entire image Im after correction.
The processing unit 42 sets the region R-3, with both ends of the region R-3 for a long distance being within the image Im after correction in the horizontal direction of the image Im after correction, and sets the regions R-1 and R-2, with both ends of the region R-1 for a short distance and both ends of the region R-2 for a medium distance being out of the both ends of the image Im after correction in the horizontal direction of the image Im after correction. That is, the processing unit 42 sets the region R, of the plurality of regions R, used for person detection at a distance larger than a predetermined value within the image Im after correction, and sets the region R, of the plurality of regions R, used for person detection at a distance equal to or smaller than the value out of the image Im after correction.
The region R is a quadrangle, and the image Im after correction is a trapezoid. For this reason, when the region R is set in the image Im after correction, it is impossible to align both ends of the region R with both ends of the image Im after correction in the horizontal direction of the image Im after correction. When both ends of the region R are out of both ends of the image Im after correction in the horizontal direction of the image Im after correction, it is unnecessary for the processing unit 42 to perform image processing of person detection on the part out of the region R.
On the other hand, when both ends of the region R are within the image Im after correction (both ends of the region R are not out of both ends of the image Im after correction) in the horizontal direction of the image Im after correction, the unnecessary image processing described above can be eliminated. This can reduce the amount of image processing. However, at both ends of the image Im after correction, there are parts where image processing of person detection is not performed. This decreases the accuracy of the person detection.
The vicinity of the construction machine 100 is dangerous. Therefore, in the embodiment, the accuracy of person detection is given priority when the distance between the camera 2 and the person 10 is a short distance or a medium distance (in the case of person detection at a distance equal to or smaller than a predetermined value), and elimination of unnecessary image processing is given priority when the distance is a long distance (in the case of person detection at a distance larger than the predetermined value).
It is to be noted that an example in which both ends of the region R cannot be aligned with both ends of the image Im after correction due to the relationship between the shape of the region R and the image Im after correction is not limited to the case in which the shape of the region R is a quadrangle and the shape of the image Im after correction is a trapezoid. For example, it also occurs in the case where the shape of the region R is a rectangle and the shape of the image Im after correction is a parallelogram.
Referring to
The second storage unit 432 stores in advance region information RI indicating conditions under which the plurality of regions R is set in the image Im after correction. The conditions for the plurality of regions R are the number of the regions R, setting positions (coordinates on the image Im after correction) of the regions R, and the like. While
The third storage unit 433 stores in advance a learning model LM subjected to machine learning. The processing unit 42 executes the prediction/recognition phase of the machine learning by using the learning model LM, thereby detecting the person 10 appearing in the image Im after correction. The learning model LM includes a learning model constructed with an image of a person in an upright state as training data, a learning model constructed with an image of a person in a half-sitting state as training data, and a learning model constructed with an image of a person in a squatting state as training data. The learning model LM is used for detecting the person 10 in each of an upright state, a half-sitting state, and a squatting state. At a construction site, the person 10 is mainly in any of an upright state, a half-sitting state, and a squatting state. For example, only with the learning model for detecting the person 10 in an upright state, the detection accuracy of the person 10 in a half-sitting state or a squatting state decreases. In the embodiment, the accuracy of person detection is improved by including the learning model LM for detecting the person 10 in each of an upright state, a half-sitting state, and a squatting state.
Referring to
If the projective transformation matrix PM, the region information RI, and the learning model LM are determined in accordance with the combination, the accuracy of person detection is improved. Therefore, the first storage unit 431 stores in advance a plurality of combinations and a plurality of projective transformation matrices PM in association with each other regarding the projective transformation matrix PM calculated in accordance with the combination of the height of the position of the camera 2 and the angle of the optical axis 21 of the camera 2. The second storage unit 432 stores in advance a plurality of combinations and a plurality of pieces of region information RI in association with each other regarding the region information RI determined in accordance with the combination of the height of the position of the camera 2 and the angle of the optical axis 21 of the camera 2. The third storage unit 433 stores in advance a plurality of combinations and a plurality of learning models LM in association with each other regarding the learning model LM constructed in accordance with the combination of the height of the position of the camera 2 and the angle of the optical axis 21 of the camera 2. Therefore, the object detection device 1 according to the embodiment can be applied to various types of the construction machine 100.
Using the operation unit 6, the person who sets the projective transformation matrix PM, the region information RI, and the learning model LM inputs, to the control processing unit 4, the height of the position of the camera 2 and the angle of the optical axis 21 of the camera 2 mounted on the construction machine 100. The correction unit 41 selects the projective transformation matrix PM associated with the combination of the input height and angle from the plurality of projective transformation matrices PM stored in the first storage unit 431, and executes correction of the inclination of the image Im. The processing unit 42 selects the region information RI associated with the combination of the input height and angle from the plurality of pieces of region information RI stored in the second storage unit 432, and executes the image processing of person detection on the image Im after correction. The processing unit 42 selects the learning model LM associated with the combination of the input and angle from the plurality of learning models LM stored in the third storage unit 433, and executes the machine learning.
The display control unit 44 causes the display unit 5 to display various images, various pieces of information, and the like. For example, the display control unit 44 causes the display unit 5 to display in real time the image Im (moving image) captured by the camera 2 during the operation of the construction machine 100, and causes the display unit 5 to display an alarm when the processing unit 42 detects the person 10. As for the alarm, for example, when the processing unit 42 detects the person 10, the display control unit 44 causes the display unit 5 to display the image Im to which a frame surrounding the person 10 is added. The color of the frame may be changed in accordance with the distance between the camera 2 and the person 10. For example, the display control unit 44 sets the color of the frame surrounding the person 10 to red when the person 10 present at a short distance is detected, sets the color of the frame surrounding the person 10 to yellow when the person 10 present at a medium distance is detected, and sets the color of the frame surrounding the person 10 to green when the person 10 present at a long distance is detected.
The display unit 5 is implemented by a liquid crystal display, an organic light emitting diode display, or the like.
The operation unit 6 is a device with which the user of the object detection device 1 (operator of the construction machine 100) inputs the operation of the object detection device 1, and the person who sets the projective transformation matrix PM and the like inputs information (height of the position of the camera 2 and the angle of the optical axis 21) necessary for the setting. The operation unit 6 is implemented by a touch screen, a hardware key, or the like.
The operation of the object detection device 1 according to the embodiment will be described.
The correction unit 41 performs projective transformation by using the selected projective transformation matrix PM on the image Im sent to the control processing unit 4, thereby correcting distortion (inclination) of the image (step S2). Thus, for example, the image Im-1 shown in
The processing unit 42 sets the region R in the image Im after correction (step S3). Here, of the three regions R shown in
The processing unit 42 performs the person detection by executing the recognition phase of the machine learning on the part corresponding to the region R-1 of the image Im in which the region R-1 for a short distance is set (step S4). More specifically, the processing unit 42 performs image processing of extracting the feature amount of the person 10 on the part corresponding to the region R-1. Here, the feature amount is a histogram of oriented gradients (HOG) feature amount. The processing unit 42 discriminates whether or not the person 10 exists in the region R-1 on the basis of the extracted HOG feature amount. For this discrimination, for example, a support vector machine (SVM) and a Cascade-AdaBoost classifier can be used.
The technique of person detection is not limited to machine learning, and may be pattern matching or optical flow, for example.
When the processing unit 42 discriminates the presence of the person 10 in the region R-1, it is determined that the person 10 has been detected from the region R-1. When the processing unit 42 discriminates the absence of the person 10 in the region R-1, it is determined that the person 10 has not been detected from the region R-1.
The learning model LM selected by the processing unit 42 includes a learning model constructed with an image of a person in an upright state as training data, a learning model constructed with an image of a person in a half-sitting state as training data, and a learning model constructed with an image of a person in a squatting state as training data. Using each of these learning models, the processing unit 42 performs the processing of person detection (step S4).
The processing unit 42 determines whether or not the processing of person detection (step S4) has ended for all of the three regions R-1, R-2, and R-3 (step S5). When the processing unit 42 determines that the processing of person detection has not ended for all of the three regions R-1, R-2, and R-3 (No in step S5), the processing unit 42 sets the region R for the image Im after correction (step S3). Here, out of the three regions R-1, R-2, and R-3, the region R-2 for a medium distance is set.
The processing unit 42 performs person detection for the region R-2 for a medium distance by using a method similar to that in the case of the region R-1 for a short distance (step S4).
The processing unit 42 determines whether or not the processing of person detection (step S4) has ended for all of the three regions R-1, R-2, and R-3 (step S5). When the processing unit 42 determines that the processing of person detection has not ended for all of the three regions R-1, R-2, and R-3 (No in step S5), the processing unit 42 sets the region R for the image Im after correction (step S3). Here, out of the three regions R-1, R-2, and R-3, the region R-3 for a long distance is set.
The processing unit 42 performs person detection for the region R-3 for a long distance by using a method similar to that in the case of the region R-1 for a short distance (step S4).
The processing unit 42 determines that the processing of person detection has ended for all of the three regions R-1, R-2, and R-3 (Yes in step S5). When the processing unit 42 detects, in step S4, the person 10 (Yes in step S6), the display control unit 44 causes the display unit 5 to display an alarm (step S7). When the processing unit 42 does not detect, in step S4, the person 10 (No in step S6), the display control unit 44 does not cause the display unit 5 to display an alarm (step S8).
The object detection device 1 executes the processing of steps S2 to S8 at predetermined time intervals (sampling period). It is to be noted that the processing unit 42 may lengthen the time interval when the distance between the camera 2 and the person 10 is long, and may shorten the time interval when the distance is short. For example, the processing unit 42 sets the time interval of the person detection using the region R-1 for a short distance to be shorter than the time interval of the person detection using the region R-2 for a medium distance and the region R-3 for a long distance.
If the time interval of person detection is long, the amount of image processing of person detection (series of processing amounts from step S2 to step S5) can be reduced, but the accuracy of person detection decreases. On the other hand, if the time interval of person detection is short, the amount of image processing of person detection is increased, but the accuracy of person detection is improved. The vicinity of the construction machine 100 is dangerous. Therefore, the accuracy of person detection is given priority when the distance between the camera 2 and the person 10 is short, and reduction of the amount of image processing of person detection is given priority when the distance is long.
According to the embodiment, distortion of the image Im is corrected by performing projective transformation on the image Im captured by the camera 2 (step S2), and person detection is performed by performing machine learning based on the HOG feature amount on the image Im after correction (step S4). This can improve the accuracy of person detection. This will be explained in detail.
An HOG feature amount 12 is one of the HOG feature amounts of the person 10-3 appearing at the right end of the image Im-1. The person 10-3 is inclined to the right, and the inclination angle is 110 degrees. Hence, the histogram has a large component in the 110 degree direction. The postures of the persons 10-2 and 10-3 are not normal. Normally, the histogram has a large component in the 90 degree direction. Therefore, when the processing unit 42 discriminates a person by using the HOG feature amount extracted from the image Im-1 before correction, the accuracy of the person detection decreases.
Referring to
The calculation method of the projective transformation matrix PM will be described. Referring to
For the person 10-2 standing upright at a distance z from the camera 2, a distance z1 (projection distance) between the person 10-2 and the projection plane of the camera 2 is expressed by the equation (1). In the equation (1), ω is an angle defined by the person 10-2 and the optical axis 21 of the camera 2 with respect to the projection plane of the camera 2, and is expressed by the equation (2).
Here, h(y) is the height (stature) of the person 10-2. In calculation of the distortion amount, the height of the foot of the person 10-2 is 0 (h(y)=0), and hence the distortion amount does not depend on the height of the person 10-2.
In the projection distance z1, a visual field width w of the projection plane of the camera 2 is expressed by the equation (3). Due to this, a pixel coordinate Xleft_bottom of the foot of the person 10-2, which appears at the left end of the range indicated by the angle of view 2ϕ of the camera 2, is expressed by the equation (4).
The pixel coordinate Xleft_bottom corresponds to the distortion amount of the person 10-2 standing upright. This can give the projective transformation matrix PM by using the equation (5).
Here, x and y are X, Y coordinates before the projective transformation, and u and v are X, Y coordinates after the projective transformation. Also, a, b, c, d, e, f, g, and h are projective transformation coefficients constituting the projective transformation matrix PM. The projective transformation coefficient is calculated by solving simultaneous equations for the four corresponding point coordinates before and after the projective transformation shown in the following table.
Although in the embodiment, the description has been given of person detection as an example, the present invention can also be applied to detection of an object (e.g., another construction machine and a safety fence) present at a civil engineering construction site other than a person.
(Summary of Embodiment)
An object detection device for a construction machine according to one aspect of an embodiment includes: a correction unit that corrects, by using projective transformation, distortion of an image captured by a camera capturing an image of a detection target object and mounted on a construction machine; and a processing unit that sets, in the image after correction, a plurality of regions provided in accordance with a distance between the camera and the object and each having a different area, and performs image processing of detecting the object with respect to a part of the image corresponding to each of the plurality of regions.
Description will be given of an example in which the detection target is a person. Since the construction machine is relatively large, the mounting position of the camera is usually above the height of the person. Since the vicinity of the construction machine is dangerous, the optical axis of the camera is set obliquely downward so that a person near the construction machine can be detected. For these reasons, the image captured by the camera has a relatively large distortion (e.g., the image is inclined relatively large). When person detection is performed in this state, the accuracy of the person detection decreases. Therefore, the correction unit corrects distortion of the image by using the projective transformation. This can improve the accuracy of person detection (object detection).
The processing unit does not perform image processing of person detection on an entire image but set a region in the image and performs image processing of person detection on a part of the image corresponding to the region (part overlapping the region).
The area of the region is varied according to the distance between the camera and the person. The reason is as follows. The longer the distance between the camera and the person is, the smaller the area of a part where the person may appear in the image becomes. For example, in the detection of a person walking outdoors, the longer the distance between the camera and the person is, the larger the area of a part where the ground appears and the area of a part where the sky appears become, and the smaller the area of a part where the person may appear becomes. Therefore, in accordance with the distance between the camera and the person, the processing unit selectively uses the plurality of regions prepared in accordance with the distance between the camera and the person. This can reduce the amount of image processing of person detection (object detection), and hence it is possible to improve the speed of person detection (object detection)
In the configuration described above, the correction unit corrects distortion of the image by performing the projective transformation on the image to reduce a horizontal length of the image from a lower side toward an upper side of the image in a vertical direction, and makes a shape of the image a trapezoid.
This configuration is applied when the image captured by the camera is distorted into a reverse trapezoid (inverted trapezoid). In an image distorted into a reverse trapezoid, an object appearing on the left side in the image is inclined to the left, and an object appearing on the right side in the image is inclined to the right, which is different from a normal appearance of the object, thereby causing a decrease in accuracy of object detection.
Then, the correction unit performs projective transformation on the image to reduce the horizontal length of the image from the lower side toward the upper side of the image in the vertical direction, thereby correcting the shape of the image into a trapezoid. Since this eliminates the distortion, it is possible to prevent the object appearing on the left side in the image from being inclined to the left and the person appearing on the right side in the image from being inclined to the right.
In the configuration described above, the plurality of regions are quadrangular in shape, and the processing unit sets the region, of the plurality of regions, used for object detection at the distance larger than a predetermined value, with both ends of the region being within the image after correction in a horizontal direction of the image after correction, and sets the region, of the plurality of regions, used for object detection at the distance equal to or smaller than the value, with both ends of the region being out of the image after correction in a horizontal direction of the image after correction.
The region is a quadrangle, and the image after correction is a trapezoid. For this reason, when the region is set in the image after correction, it is impossible to align both ends of the region with both ends of the image after correction in the horizontal direction of the image after correction.
When both ends of the region are out of both ends of the image after correction in the horizontal direction of the image after correction, it is unnecessary for the processing unit to perform image processing of object detection on the part out of the region.
On the other hand, when both ends of the region are within the image after correction (both ends of the region are not out of both ends of the image after correction) in the horizontal direction of the image after correction, the unnecessary image processing described above can be eliminated. This can reduce the amount of image processing. However, at both ends of the image after correction, there are parts where image processing of object detection is not performed. This decreases the accuracy of the object detection.
The vicinity of the construction machine is dangerous. Therefore, in this configuration, the accuracy of object detection is given priority when the distance between the camera and the object is short (in the case of object detection at a distance equal to or smaller than a predetermined value), and elimination of unnecessary image processing is given priority when the distance is long (in the case of object detection at a distance larger than the predetermined value).
In the configuration described above, the processing unit sets, within the image after correction, the region, of the plurality of regions, used for object detection at the distance larger than a predetermined value, and sets, out of the image after correction, the region, of the plurality of regions, used for object detection at the distance equal to or smaller than the value.
An example in which both ends of the region cannot be aligned with both ends of the image after correction due to the relationship between the shape of the region and the image after correction is not limited to the case in which the shape of the region is a quadrangle and the shape of the image after correction is a trapezoid. For example, it also occurs in the case where the shape of the region is a rectangle and the shape of the image after correction is a parallelogram.
In this configuration, the accuracy of object detection is given priority when the distance between the camera and the object is short (in the case of object detection at a distance equal to or smaller than a predetermined value), and elimination of unnecessary image processing is given priority when the distance is long (in the case of object detection at a distance larger than the predetermined value).
In the configuration described above, the object detection device further includes a first storage unit that stores in advance a plurality of combinations of a height of a position of the camera and an angle of an optical axis of the camera and a plurality of the projective transformation matrices in association with each other regarding the projective transformation matrix used for the projective transformation, calculated in accordance with a combination of a height of a position of the camera and an angle of an optical axis of the camera, in which the correction unit selects the projective transformation matrix associated with a combination of a height of a position of the camera mounted on the construction machine and an angle of an optical axis of the camera, and executes the correction.
There are various types of the construction machine, and the shape and size of the construction machine are different depending on the type. For this reason, the combination of the height of the position and the angle of the optical axis of the camera are different depending on the type of the construction machine. If the projective transformation matrix is calculated in accordance with this combination, the accuracy of object detection is improved. According to this configuration, since a plurality of combinations of a height of a position and an angle of an optical axis of a camera and a plurality of projective transformation matrices are stored in association with each other regarding the projective transformation matrix calculated in accordance with the combination of the height of the position and the angle of the optical axis of the camera, the object detection device can be applied to various types of the construction machine.
In the configuration described above, the object detection device further includes a second storage unit that stores in advance a plurality of combinations of a height of a position of the camera and an angle of an optical axis of the camera and a plurality of pieces of region information in association with each other regarding the region information indicating conditions under which the plurality of regions are set in the image, determined in accordance with a combination of a height of a position of the camera and an angle of an optical axis of the camera, in which the processing unit selects the region information associated with a combination of a height of a position of the camera mounted on the construction machine and an angle of an optical axis of the camera, and executes the image processing.
As described above, the combination of the height of the position and the angle of the optical axis of the camera are different depending on the type of the construction machine. The conditions for the plurality of regions (the number of regions, the setting position of the region, the area of the region, and the like) are different in accordance with the combination. Therefore, if the region information indicating the conditions for the plurality of regions is determined in accordance with the combination, the accuracy of object detection is improved. According to this configuration, since a plurality of combinations of a height of a position and an angle of an optical axis of a camera and a plurality of pieces of region information are stored in association with each other regarding region information determined in accordance with the combination of the height of the position and the angle of the optical axis of the camera, the object detection device can be applied to various types of the construction machine.
In the configuration described above, the processing unit lengthens a time interval of object detection using each of the plurality of regions when the distance is long, and shortens the time interval when the distance is short.
If the time interval (sampling period) of object detection is long, the amount of image processing of object detection can be reduced, but the accuracy of object detection decreases. On the other hand, if the time interval of object detection is short, the amount of image processing of object detection is increased, but the accuracy of object detection is improved.
The vicinity of the construction machine is dangerous. Therefore, in this configuration, the accuracy of object detection is given priority when the distance between the camera and the object is short, and reduction of the amount of image processing of object detection is given priority when the distance is long.
In the above configuration, the object detection device further includes a third storage unit that stores in advance a learning model subjected to machine learning for detecting a person in each of an upright state, a half-sitting state, and a squatting state, in which the processing unit detects a person, which is the object appearing in the image after correction, by using the learning model (execution of a prediction/recognition phase of the machine learning using the learning model as the image processing).
In this configuration, the detection target object is a person, and the person appearing in an image after correction is detected by the machine learning (e.g., person detection using an HOG feature amount). At a construction site, a person is mainly in any of an upright state, a half-sitting state, and a squatting state. For example, only with a learning model for detecting a person in an upright state, the detection accuracy of the person in a half-sitting state or a squatting state decreases. This configuration improves the accuracy of person detection by including a learning model for detecting a person in each of an upright state, a half-sitting state, and a squatting state.
In the configuration described above, the third storage unit stores in advance a plurality of combinations of a height of a position of the camera and an angle of an optical axis of the camera and a plurality of the learning models in association with each other regarding the learning model constructed in accordance with a combination of a height of a position of the camera and an angle of an optical axis of the camera, and the processing unit selects the learning model associated with a combination of a height of a position of the camera mounted on the construction machine and an angle of an optical axis of the camera, and executes the machine learning.
As described above, the combination of the height of the position and the angle of the optical axis of the camera are different depending on the type of the construction machine. If a learning model is constructed in accordance with the combination, the accuracy of person detection is improved. According to this configuration, since a plurality of combinations of a height of a position and an angle of an optical axis of a camera and a plurality of learning models are stored in association with each other regarding a learning model constructed in accordance with the combination of the height of the position and the angle of the optical axis of the camera, the object detection device can be applied to various types of the construction machine.
An object detection method for a construction machine according to another aspect of the embodiment includes: a correction step of correcting, by using projective transformation, distortion of an image captured by a camera capturing an image of a detection target object and mounted on a construction machine; and a processing step of setting, in the image after correction, a plurality of regions provided in accordance with a distance between the camera and the object and each having a different area, and performing image processing of detecting the object with respect to a part of the image corresponding to each of the plurality of regions.
The object detection method according to another aspect of the embodiment defines the object detection device according to one aspect of the embodiment from the point of view of the method, and has the similar operations and effects to those of the object detection device according to one aspect of the embodiment.
Number | Date | Country | Kind |
---|---|---|---|
2018-133532 | Jul 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/026924 | 7/8/2019 | WO | 00 |