The present invention relates to an object detection apparatus, an object detection method, and a mobile robot, and more particularly, to a technique that enables a mobile object such as a robot to detect an object in a real-world space.
In recent years, study of photographing an area in front of a vehicle and detecting an object in front of the vehicle has been actively conducted on automobiles. For example, an in-car compound eye camera apparatus in which in a compound eye camera in which optical filters having different characteristics are arranged on the top surface of an imaging element, the optical filters are divided into a plurality of regions, a photographed image has different characteristics according to a region, and image processing according to the characteristics is performed on each region is proposed in Patent Document 1.
Patent Document 1: JP 2013-225289 A
The invention disclosed in Patent Document 1 is on the premise of an automobile with four wheels, and the optical filter is arranged to be fixed to the imaging element, and thus the plurality of regions are consistently fixed onto the image in the state in which the compound eye camera apparatus is mounted in the automobile. In the case of a mobile object in which an inclination in a roll direction is ignorable, an object far from its own vehicle or an object higher than its own vehicle is photographed by an upper portion of the imaging element, that is, as an upper portion of the image. On the other hand, an object close to its own vehicle such as a bonnet of its own vehicle or a road surface is photographed by a lower portion of the imaging element, that is, as a lower portion of the image. In other words, in the case of the mobile object in which the inclination in the roll direction is ignorable, a location in the image is associated with a photographed space, and there is no problem even though the plurality of regions are fixed on the image. However, in a mobile object such as a unicycle or a two-wheeled vehicle in which the inclination in the roll direction occurs, when it is inclined in the roll direction, an image in which the top and bottom are inclined is photographed. In other words, when the invention disclosed in Patent Document 1 is applied to the mobile object in which the inclination in the roll direction is unignorable, the location in the image photographed by the camera is not associated with the photographed space, and thus an inappropriate process is likely to be performed.
The present invention was made in order to solve the above problems, and it is an object of the present invention to provide an object detection apparatus and an object detection method, which are capable of properly making a relation between the location in the image and the photographed space, detecting an object at a high detection rate, and improving object detection performance even in the mobile object in which the inclination in the roll direction is unignorable.
An object detection apparatus according to the present invention may employ, for example, configurations set forth in claims. Specifically, an object detection apparatus includes a camera pose calculation unit, a region setting unit, a processing method determining unit, an image generating unit, and an object detection unit, in which the camera pose calculation unit acquires information related to a pose of a camera installed in a mobile object, the region setting unit makes a relation between a location in an image photographed through the camera and a photographed space based on the pose of the camera, and sets a plurality of detection regions on the image based on the relation, the processing method determining unit determines an image processing method including a setting of a resolution for each of the plurality of detection regions, the image generating unit converts an image in each of the detection regions to have the set resolution, and generates a region image, and the object detection unit detects an object using each of the region images.
According to the present invention, the object detection apparatus calculates a pose of a camera, makes a relation between a location in an image and a photographed space based on the calculated pose of the camera, divides the image photographed by the camera into a plurality of regions based on the relation, and sets a resolution of each region and thus can detect an object at a high detection rate even when an inclination of the camera is large.
Hereinafter, embodiments of the present invention will be described with reference to the appended drawings. In the drawings, components having the same reference numerals are assumed to have the same function.
The first and second cameras 121 and 122 are cameras that photograph a space in front of the mobile robot 100.
A specific example of the mobile robot 100 includes an electric two-wheel humanoid mobile robot 100 that travels as the wheels are driven by a motor serving as the moving mechanism 124. The moving mechanism 124 may be an electric one-wheel mobile robot.
As illustrated in
The CPU 131 reads a program stored in the main storage unit 132 or the auxiliary storage unit 133, executes an operation, and outputs an operation result to the main storage unit 132, the auxiliary storage unit 133, or the mobile object control unit 123.
The main storage unit 132 stores the program executed by the CPU 131, the result of the operation executed by the CPU 131, and setting information used by the information processing apparatus 125. For example, the main storage unit 132 is implemented by a random access memory (RAM), a read only memory (ROM), or the like.
The auxiliary storage unit 133 stores the program executed by the CPU 131, the result of the operation executed by the CPU 131, and setting information used by the information processing apparatus 125. Particularly, the auxiliary storage unit 133 is used to store data that cannot be stored in the main storage unit 132 or to hold data even in a state in which power is turned off. For example, the auxiliary storage unit 133 is configured with a magnetic disk drive such as a hard disk drive (HDD), a non-volatile memory such as a flash memory, or a combination thereof. Information related to an object such as a shape of a vehicle, a person, a building, or the like is set to the auxiliary storage unit 133 as a detection target in advance. The display apparatus 140 is configured with an information processing apparatus having a similar function as the information processing apparatus.
The transceiving apparatus 126 communicates with the display apparatus 140 to receive a command of the user that is given from the display apparatus 140 to the mobile robot 100 and outputs a processing result of the information processing apparatus 125 to the display apparatus 140.
The storage apparatus 127 stores a processing program for movement control for moving the mobile robot 100, data such as the setting information, a space map of a monitoring target, and the like. The storage apparatus 127 can be implemented using a storage apparatus such as a HDD. The information related to the object which is set to the auxiliary storage unit 133 may be set to the storage apparatus 127.
The external world measuring apparatus 128 is an external world sensor that measures data indicating a relative position or an absolute position in an outside (an actual environment) of the mobile robot 100, and includes, for example, a landmark detection sensor (a laser range finder or the like) that measures a landmark and measures the relative position of the mobile robot 100, a GPS device that receives radio waves from GPS satellites and measures the absolute position of the mobile robot 100, and the like.
The internal world measuring apparatus 129 is a sensor that measures a rotational amount of the wheel in the inside of the mobile robot 100 (the moving mechanism 124) and a pose (acceleration of three rotational axes and three translation axes) measured by a gyro sensor, and includes, for example, a rotary encoder that measures a rotational amount of the wheel of the mobile robot 100, a gyro sensor that measures the pose (acceleration of three rotational axes and three translation axes) of the mobile robot 100, and the like.
The configuration of the mobile robot 100 is not limited thereto, and the mobile robot 100 may have a moving mechanism and an imaging function.
As a configuration and a processing operation of a processing program for autonomous movement control of the mobile robot 100, a configuration and a processing operation of a processing program capable of causing the mobile robot 100 to reach a destination by causing the mobile robot 100 to autonomously move in an actual environment to follow a planned path while estimating its own position on a map corresponding to an actual environment based on measurement data measured by an external world sensor and an internal world sensor may be used.
The camera pose calculation unit 211 calculates a pose of the second camera 122 using two or more first frame images (first images) acquired by the second camera 122, and calculates a pose of the first camera 121 based on the calculated pose of the second camera 122 and a pose of the first camera 121 with respect to the second camera 122 that is calculated in advance. Further, when the first frame images are determined not to be normal, for example, when it is determined that there is a large change in a plurality of consecutive first frame images or when the frame images have a uniform color, and an area in front thereof is determined not to be photographed, the camera pose calculation unit 211 transmits a flag indicating an abnormality of the frame image to the region calculation unit 212 and the object detection unit 215.
The camera pose calculation unit 211 may acquire the pose of the first camera 121 measured using the gyro sensor with which the internal world measuring apparatus 129 is equipped. A pose that is obtained as a result of integrating the pose measured using the gyro sensor with which the internal world measuring apparatus 129 is equipped with the pose of the first camera 121 calculated using the two or more first frame images acquired by the second camera 122 may be output as a result.
The region calculation unit (region setting unit) 212 acquires the pose of the first camera 121 from the camera pose calculation unit 211, makes a relation between a location in a second image of the first camera 121 and a photographed space based on the acquired pose, acquires object information of the second image of a previous frame stored in the main storage unit 132, and sets a plurality of regions to the inside of the second image photographed by the first camera 121 based on the relation and the acquired object information.
The processing method determining unit 213 determines an image processing method including a setting of a resolution necessary for detection of an object on each of a plurality of regions in the second image calculated by the region calculation unit 212.
The image generating unit 214 generates a region image by performing image processing on each of a plurality of regions in the second image calculated by the region calculation unit 212 based on the image processing method including the resolution determined by the processing method determining unit 213.
The object detection unit 215 acquires the flag indicating the abnormality of the frame image from the camera pose calculation unit 211, acquires a plurality of region images from the image generating unit 214, detects an object in each of the region images in the second image based on the object information, integrates an obtained detection result with the object information obtained at a previous timing, stores an integration result in the main storage unit 132, and transmits the integration result to the mobile object control unit 123. The detection result of the object stored in the main storage unit 132 is also output to the display apparatus 140 through the transceiving apparatus 126.
First, at a timing Tn, the second camera 122 photographs the space in front of the mobile robot 100, and acquires two or more temporally consecutive first frame images (S301). At the same timing, the first camera 121 photographs the space in front of the mobile robot 100, and acquires one second frame image (S302). The region calculation unit 212 calculates an obstacle map based on the object information that is obtained at an immediately previous timing Tn−1 and stored in the main storage unit 132 and region information that is invisible as a field of vision at the immediately previous timing Tn−1 (S303).
Then, the camera pose calculation unit 211 calculates a plurality of feature points on each of the two or more first frame images stored in the main storage unit 132, associates a set of calculated feature points, calculates the pose of the second camera 122 based on the set of associated feature points, and calculates the pose of the first camera 121 for the world coordinate system based on the pose of the second camera 122 and the pose of the first camera 121 with respect to the second camera 122 which is calculated in advance (S304).
Then, the region calculation unit 212 acquires the pose of the first camera 121 from the camera pose calculation unit 211, makes a relation between the location in the second image and the photographed space based on the pose, and calculates a plurality of detection regions Ci in the second image photographed by the first camera 121 based on the object information and the obstacle map (S305).
Then, the processing method determining unit 213 determines an image processing method (a conversion rate mi/ni times and a cutoff frequency fi of a low pass filter) including a resolution for object detection for each of the detection regions Ci calculated by the region calculation unit 212 (S306).
Then, the image generating unit 214 generates N images by clipping portions of the detection regions Ci from the second image stored in the main storage unit 132 using the detection regions Ci calculated by the region calculation unit 212 and the image processing method calculated by the processing method determining unit 213 (S307).
Finally, the object detection unit 215 performs an object detection process on the generated N images, and detects an object (S308).
[S301 to S302 and S304] (Image Acquisition and Camera Pose Calculation):
First, image acquisition and camera pose calculation processes (S301 to S302 and S304) will be described with reference to
As illustrated in
Object information (00-3) obtained from an image photographed at the immediately previous timing Tn−1 is also stored in the main storage unit 132.
After step S301 is completed, the camera pose calculation unit 211 calculates a plurality of feature points for each of the two or more frame images stored in the main storage unit 132 (which is referred to as an image collection (01-1) in
Pc=Rcw×Pw+Tcw (1)
The translation Tcw and the rotation Rcw may be calculated based on the set of associated feature points using a known method.
Further, when a distance on the image between the set of associated feature points is larger than a predetermined distance or when there are a predetermined number or more of feature points that have failed to be associated, a change in the temporally consecutive frame images is regarded as being large, and the flag indicating the abnormality of the frame image is transmitted to the region calculation unit 212 and the object detection unit 215. Further, even when the frame images have a uniform color, the flag indicating the abnormality of the frame image is transmitted to the region calculation unit 212 and the object detection unit 215. When the flag indicating the abnormality of the frame image is transmitted, a pose in a current frame is calculated using a set of poses of a camera in images of several immediately previous frames. For example, a result of calculating an average of temporal changes of the pose of the camera of several immediately previous frames and adding the average to the pose of the camera in the immediately previous frame may be determined as the pose of the current frame.
Next, a method of calculating the pose of the first camera 121 for the world coordinate system based on the pose of the second camera 122 and the pose of the first camera 121 with respect to the second camera 122 which is calculated in advance will be described. When the translation of the first camera 121 relative to the second camera 122 is indicated by Tc2c1, and the rotation thereof is indicated by Rc2c1, a relation between coordinates Pc1 indicating the coordinate system of the second camera 122 and coordinates Pc2 indicating the coordinate system of the first camera 121 is expressed by Formula (2):
Pc2=Rc2c1×Pc1+Tc2c1 (2)
The pose of the first camera 121 is a translation Tc2w and a rotation Rc2w of the first camera 121 relative to the world coordinate system, and calculated as Tc2w=Rc2c1×Tc1w+Tc2c1 and Rc2w=Rc2c1× Rc1w based on Formula (3) obtained based on a relational expression of Formulas (1) and (2):
Pc2=(Rc2c1×Rc1w)×Pw+(Rc2c1×Tc1w+Tc2c1) (3)
The translation Tc2c1 and the rotation Rc2c1 of the first camera 121 relative to the second camera 122 are calculated in advance as described above. The calculated pose of the first camera 121 is stored in the main storage unit 132 and used as prior information when the pose of the first camera 121 is calculated at a next timing.
[S303] (Obstacle Map Calculation):
The obstacle map calculation process (S303) will be described with reference to
Before the process of calculating the pose of the first camera 121 at the current timing Tn (S304) is performed, the region calculation unit 212 calculates the obstacle map two pieces of information, that is, the object information (00-3) that is obtained at the immediately previous timing Tn−1 and stored in the main storage unit 132 and the region that is invisible at the immediately previous timing Tn−1 (S303). The immediately previous timing Tn−1 indicates a timing at which the first and second cameras 121 and 122 perform the photography before S301 and S302 are performed.
Hereinafter, the obstacle map is referred to as a “map” indicating a region that can be an obstacle at the current timing Tn.
A region 506 is a region serving as an obstacle at the current timing Tn due to influence of an object 523 obtained at the immediately previous timing illustrated in
Further, when the flag indicating the abnormality of the frame image (01-2) is transmitted the camera pose calculation unit 211, the movable region during the period of time from the immediately previous timing Tn−1 to the current timing Tn may be widely set when the regions 504 to 506 and the regions 511 to 513 are calculated. Thus, it is possible to cover deviations of the regions 504 to 506 and the regions 511 to 513 occurring due to an error of the pose of the first camera 121 and suppress degradation in the object detection performance.
Further, when the flag indicating the abnormality of the frame image is transmitted from the camera pose calculation unit 211, the movable range during the period of time from the immediately previous timing Tn−1 to the current timing Tn may be widely set. Thus, it is possible to cover deviations of the regions 511 to 513 occurring due to an error of the pose of the first camera 121 and suppress degradation in performance of the object detection unit 215.
[S305 to S306] (Calculation of Plurality of Regions in Image and Determination of Processing Method of Each Region):
Next, a process (S305 to S306) of calculating a plurality of regions in an image and determining a processing method of each region will be described with reference to
The region calculation unit 212 acquires the pose of the first camera 121 from the camera pose calculation unit 211, makes a relation between the location in the second image and the photographed space based on the acquired pose, and calculates and sets a plurality of regions in the second image photographed by the first camera 121 based on the obtained relation (S305). Hereinafter, this process is illustrated in
The region calculation unit 212 virtually arranges a space Si having a width SWi and a depth SDi at a position on a road surface that is apart from the camera position P by a Zw-axis distance Di=1, 2, . . . , N) in the world coordinate system (S601).
The region calculation unit 212 modifies the space Si arranged in S601 according to the obstacle map generated in S303 (S602). The modified new space is indicated by Si′.
Further, when the flag indicating the abnormality of the frame image is transmitted from the camera pose calculation unit 211, the calculated pose of the first camera 121 is determined to have a large error than an original value, and thus the space Si may be output as the space Si′ without modifying the space Si according to the obstacle map. As a result, it is possible to suppress the degradation in the performance of the object detection unit 215 caused by the deviations of the regions 504 to 506, the regions 511 to 513 occurring due to the error of the pose of the first camera 121.
Further, when there is determined to be room in the processing load of the CPU 131, the space Si may be output as the space Si′ without modifying the space Si according to the obstacle map. Thus, when an object detection omission has occurred in the detection result of the object detection unit 215 at a previous timing, a possibility that an object will be detected is high.
The region calculation unit 212 calculates the detection region Ci corresponding to the modified space Si′ in the image I obtained by the photography performed by the first camera 121 (S603).
Then, the processing method determining unit 213 determines the processing method for each of the detection regions Ci calculated by the region calculation unit 212.
Further, when an object is detected in the space Si′ calculated based on an object that is detected previously such as the spaces 801 to 803 of
Then, the processing method determining unit 213 determines the image processing method including the setting of the resolution for object detection for each of a plurality of regions in the second image calculated by the region calculation unit 212.
The processing method determining unit 213 first calculates the resolution Ri (a pair of Pw and Ph) necessary for photographing an object of a certain size with a predetermined number of pixels for each detection region Ci. In other words, the conversion rate mi/ni times (ni>mi) is calculated based on a Rl/Ri using the number Ri of pixels calculated in step S1001 and the number Rl of pixels necessary for photographing the object that is set as the detection target in advance, and the cutoff frequency fi of the low pass filter is determined according to the conversion rate (S1002). The number of pixels necessary for the detection process in the object detection unit 215 is set as Rl in advance. For example, 20 to 30 pixels are used as Rl. The cutoff frequency fi of the low pass filter is set in advance for each possible conversion rate.
[S307 to S308] (Region Image Generation and Object Detection):
Next, the region image generation and object detection processes (S307 and S308) will be described with reference to
The image generating unit 214 clips portions of the detection regions Ci from the second image stored in the main storage unit 132 using the detection regions Ci calculated by the region calculation unit 212 and the image processing method (the conversion rate mi/ni times and the cutoff frequency fi of the low pass filter) calculated by the processing method determining unit 213, and then reduces the clipped portions using the conversion rate mi/ni times and the cutoff frequency fi of the low pass filter (S307). As a result, N images serving as the object detection target are generated.
The necessary number Rl of pixels of the object serving as the detection target is set in advance, and the perspective relation between the locations in the image is understood. Typically, there is a relation that an object far from the mobile object or an object higher than its own vehicle is photographed as an upper portion of the image, and an object close to the mobile object or the road surface is photographed as a lower portion of the image. Thus, when the detection region is close to or far from the center P of the camera, there occurs a difference in the image processing method, and thus it is possible to cause the number of pixels of the object serving as the detection target over the entire region of the image to fall within a certain range. By setting the necessary number of pixels of the object serving as the detection target to be reduced, it is possible to detect the object at a high speed. Further, since the number of pixels of the object serving as the detection target falls within the certain range, it is possible to apply the object detection technique in which it is considered, and it is possible to detect the object at the high detection rate.
Further, when the pose of the camera is inclined in the roll direction, the detection region Ci has a parallelogram as illustrated in
The object detection unit 215 performs the object detection process on the N images generated in step S307, integrates the obtained detection result with the object information at the previous timing, stores the integration result in the main storage unit 132, and transmits the integration result to the mobile object control unit 123 (S308).
A known method may be used as the object detection process.
As described above, the object detection apparatus 1000 according to the present embodiment includes a pose calculating unit that obtains a pose of a camera, a region setting unit that makes a relation between a location in an image and a photographed space based on the obtained pose, and sets a plurality of regions on an image photographed through the camera using the relation, a processing method setting unit that sets each of resolutions of images in a plurality of regions, a resolution converting unit that performs resolution conversion according to the set resolution, and a detecting unit that detects an object for each of the plurality of images after the resolution conversion.
According to this feature, the pose of the camera is calculated, the relation between the location in the image and the photographed space is made based on the calculated pose of the camera, and the image photographed through the camera is appropriately divided into a plurality of regions based on the relation, and thus even when the inclination of the camera in the roll direction is large, the object can be detected at the high detection rate.
Further, when the flag indicating the abnormality of the frame image is transmitted from the camera pose calculation unit 211, the pose of the camera 122 calculated through the camera pose calculation unit 211 is determined to have a large error than an original value, and thus the object detection process at the current timing may be stopped. As a result, the erroneous detection of the object detection unit 215 can be suppressed.
Further, reliability of the result of performing the object detection process may be measured and stored in the main storage unit 132 in association with the object information. When the movable region of the object detected at the immediately previous timing is calculated in step S303 of the region calculation unit 213, a margin of the movable region is set, and the margin is variably controlled according to the reliability of the detection result, and thus it is possible to generate an appropriate obstacle map and perform an efficient object detection process.
The present invention is not limited to the above embodiment, and various modified examples are included. The above embodiment has been described in detail to facilitate understanding of the present invention, and the present invention is not limited to a configuration necessarily including all the components described above. Further, some components of a certain embodiment may be replaced with components of another embodiment. Further, components of another embodiment may be added to components of a certain embodiment. Furthermore, other components may be added to, deleted from, and replace some components of each embodiment.
All or some of the above components, functions, processing units, processing means, or the like may be implemented by hardware such that they are designed by, for example, as integrated circuit (IC). The above components, functions, or the like may be implemented by software by interpreting and executing a program of implementing the functions through a processor. Information such as a program, a table, or a file for implementing each function may be stored in a recording apparatus such as a memory, a hard disk, a solid state drive (SSD) or a recording medium such as an IC card, an SD card, or a DVD.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/058076 | 3/24/2014 | WO | 00 |