The present disclosure relates to a positioning device that determines a position of a moving body.
WO 2016/031105 A discloses an information processing device including a tracking unit that acquires images captured by an imaging unit provided on a moving body and associates feature points in an image captured before motion with feature points in an image captured after the motion, a region estimation unit that acquires information on the motion and estimates, based on the information, a region where changes in two-dimensional positions, as viewed from the moving body, of the feature points before and after the motion are small, and an estimation processing unit that estimates a self-position of the moving body based on the feature points associated with each other by the tracking unit and located in the region. This provides the information processing device capable of satisfactorily performing feature point tracking even when a camera suddenly changes in orientation and high in robustness.
The present disclosure provides a positioning device that efficiently determines a position of a moving body based on motion information on the moving body.
One aspect of the present disclosure provides a positioning device that determines a position of a moving body. The positioning device includes an imaging unit that is mounted on the moving body and captures an image of surroundings of the moving body to acquire the captured image, a detector that detects motion information indicating motion of the moving body, a controller that extracts a feature point from the captured image, and a storage that stores position information indicating a spatial position of the feature point in the surroundings. The controller searches the captured image for an on-image position corresponding to the spatial position indicated by the position information and computes a positional relationship between the spatial position indicated by the position information and the imaging unit to obtain the position of the moving body in the surroundings. The controller sets, based on the motion information detected by the detector, a reference point for use in searching the captured image for the spatial position.
Another aspect of the present disclosure provides a positioning device that determines a position of a moving body. The positioning device includes an imaging unit that is mounted on the moving body and captures an image of surroundings of the moving body to acquire the captured image, a detector that detects motion information indicating motion of the moving body, a controller that extracts a feature point from the captured image, and a storage that stores position information indicating a spatial position of the feature point in the surroundings. The controller searches the captured image for an on-image position corresponding to the spatial position indicated by the position information and computes a positional relationship between the spatial position indicated by the position information and the imaging unit to obtain the position of the moving body in the surroundings. The controller changes, based on the motion information detected by the detector, a search range for use in searching the captured image for the spatial position.
The positioning device according to the present disclosure is capable of efficiently determining the position of the moving body based on the motion information on the moving body.
Hereinafter, embodiments according to the present disclosure will be described with reference to the drawings. Note that, in each of the following embodiments, the similar components are denoted by the same reference numerals.
A positioning device according to the first embodiment of the present disclosure is mounted on a moving body such as a manned cargo-handling vehicle, an automated guided vehicle (AGV), or an autonomous mobile cargo-carrying robot, and the positioning device determines a position of the moving body.
For example, Visual-Simultaneous Localization and Mapping (SLAM) for determining a self-position and constructing 3D map information based on images captured one after another is applicable to the positioning device 100.
The positioning device 100 extracts feature points in the image captured by the camera 2. Examples of such feature points include an edge, a corner, and the like of an object, a road, a structure, and the like. The positioning device 100 constructs a 3D map by transforming coordinates on the image of each feature point thus extracted on the image into world coordinates and setting a map point corresponding to the feature point on the image to a world coordinate space. The positioning device 100 causes the camera 2 to capture the images of surroundings of the moving body 1 at a constant frame rate while the moving body 1 is in motion and performs a feature point matching process of associating each feature point on each image thus captured with a map point on the 3D map. The positioning device 100 computes a position and orientation of the camera 2 (hereinafter referred to as a “camera pose”) based on a geometrical positional relationship between a feature point in the current frame and a feature point in the previous frame. The positioning device 100 can obtain a position of the positioning device 100 and in turn a position of the moving body 1 based on the position of the camera 2 thus computed.
Position information determined by the positioning device 100 is stored in, for example, an external server and may be used for various data management in the surroundings through which the moving body 1 has traveled.
The positioning device 100 may be used to move the moving body 1 based on the position information on the moving body 1 thus computed and the information on the 3D map thus constructed.
The camera 2 is an example of an imaging unit according to the present disclosure. The camera 2 is installed on the moving body 1 and captures the image of the surroundings of the moving body 1 to generate color image data and distance image data. The camera 2 may include a depth sensor such as an RGB-D camera or a stereo camera. Alternatively, the camera 2 may include an RGB camera that captures a color image and a time-of-flight (ToF) sensor that captures a distance image.
The IMU 3 is an example of a detector according to the present disclosure. The IMU 3 includes an accelerometer that detects acceleration of the moving body 1 and a gyroscope that detects angular velocity of the moving body 1.
The controller 4 includes a general-purpose processor such as a CPU or MPU that cooperates with software to implement a predetermined function. The controller 4 loads and executes a program stored in the storage 5 to implement various functions of a feature point extraction unit 41, a feature point matching unit 42, a position computation unit 44, and a map management unit 45, and the like to control the overall operation of the positioning device 100. For example, the controller 4 executes a program for implementing a positioning method according to the present embodiment or a program for implementing the SLAM algorithm. The controller 4 is not limited to a controller that implements a predetermined function through cooperation between hardware and software, and the controller 4 may be a hardware circuit such as an FPGA, an ASIC, or a DSP customized for implementing the predetermined function.
The storage 5 is a recording medium that stores various information including a program and data necessary for implementing the functions of the positioning device 100. For example, a 3D map 51 and image data are stored in the storage 5. The storage 5 is implemented by any one or combination of storage devices such as a semiconductor memory device such as a flash memory or an SSD, a magnetic storage device such as a hard disk, and a storage device of a different type. The storage 5 may include a volatile memory such as an SRAM or a DRAM capable of high-speed operation for temporarily storing various information. Such a volatile memory serves as, for example, a work area of the controller 4 or a frame buffer that temporarily stores image data on a frame-by-frame basis.
The communication I/F 7 is an interface circuit that enables a communication connection to be established between the positioning device 100 and an external device such as a server 150 over a network 50. The communication I/F 7 makes communications in accordance with a standard such as IEEE802.3, IEEE802.11, or Wi-Fi.
The drive unit 8 is a mechanism that moves the moving body 1 in accordance with an instruction from the controller 4. For example, the drive unit 8 includes a drive circuit of an engine connected to tires of the moving body 1, a steering circuit, and a brake circuit.
First, the controller 4 acquires a captured image captured at time t (S10). Herein, the captured image is image data that is captured by the camera 2 and represents the surroundings of the moving body 1.
Next, the controller 4 serving as the feature point extraction unit 41 analyzes the captured image to extract feature points (S20).
Note that the controller 4 performs not only the process of computing the self-position of the moving body 1 but also the process of constructing the 3D map 51. The controller 4 serving as the map management unit 45 transforms the coordinates of each feature point on the captured image 10 into world coordinates and sets a map point corresponding to the feature point on the captured image 10 to a world coordinate space to construct the 3D map 51. On the 3D map 51, the map points corresponding to the feature points on the captured image 10, a camera frame showing the captured image 10, and a camera pose of the camera 2 when the captured image is captured, are recorded. Information on the 3D map 51 thus constructed is stored in the storage 5. For example, the controller 4 is capable of constructing the 3D map 51 as illustrated in
Returning to
Next, the controller 4 serving as the position computation unit 44, computes the camera pose at time t. The controller 4 is capable of obtaining the position (self-position) of the positioning device 100 and in turn the position (self-position) of the moving body 1 based on the camera pose thus computed (S40). The camera pose at time t is computed based on, for example, the geometrical positional relationship between each feature point on the image captured at time t and a corresponding feature point on the image captured at time t−Δt. Alternatively, the camera pose at time t is computed based on, for example, the camera pose at time t−Δt and a result of detection made by the IMU 3.
The controller 4 repeats the above-described steps S10 to S40 at the predetermined time intervals Δt (S60) until the controller 4 determines the end of process (S50). The end of process is determined, for example, when the user inputs a process termination command. At the end of process (Yes in S50), the controller 4 transmits, to the server 150, information such as the 3D map 51 thus constructed.
A description will be given below of details of the feature point matching step S30 shown in
In
A captured image 10a is an image captured by the camera 2a at time t−2Δt. The captured image 10a contains feature points Fa1 and Fa2 (see step S20 shown in
In
Returning to
A description will be given of step S32 and step S33 with reference to the example shown in
Returning to
Next, the controller 4 computes a degree of similarity between the feature of the projection point and the feature of the feature point in the search range D (S35). Step S35 will be described with reference to
In step S35, the controller 4 computes the degree of similarity between the feature of the projection point P1 and the feature of each of the feature points Fc1, Fc3, Fc4, Fc5 in the predetermined search range D centered around the projection point P1.
Examples of the feature of the feature point includes a SURF feature obtained based on Speeded-Up Robust Features (SURF), a SIFT feature obtained based on Scale-Invariant Feature Transform (SIFT), and an ORB feature obtained based on Oriented FAST and Rotated BRIEF (ORB).
The feature of the feature point is represented by, for example, a vector with one or more dimensions. For example, the SURF feature is represented by a 64-dimensional vector, and the SIFT feature is represented by a 128-dimensional vector.
The feature of the projection point is acquired when the feature point is extracted from the captured image captured before time t−Δt, and is stored in the storage 5 together with the feature point.
The degree of similarity computed in step S35 is computed as, for example, a distance such as the Euclidean distance between features.
The controller 4 specifies, subsequent to step S35, the feature point corresponding to the projection point based on the degree of similarity computed in step S35 (S36). In the example shown in
In step S36, when the degree of similarity between the projection point and the feature point is less than a predetermined threshold, the controller 4 does not specify the feature point as the feature point corresponding to the projection point. When there is no feature point having the degree of similarity with the projection point equal to or more than the threshold in the search range D, there is no feature point specified as the feature point corresponding to the projection point. In other words, feature point matching fails.
In step S36, when there are a plurality of feature points having the degree of similarity with the projection point equal to or more than the threshold in the search range D, the controller 4 specifies a feature point having the highest degree of similarity as the feature point corresponding to the projection point.
The controller 4 determines, subsequent to step S36, whether all the map points in the 3D map 51 have been projected onto the captured image 10c (S37). When all the map points have not been projected (No in S37), the controller 4 returns to step S32, selects one map point that has yet to be projected, and executes steps S33 to S37. When all the map points have been projected (Yes in S37), the feature point matching S30 is brought to an end.
A description will be given below in details of a step S31 of computing the predicted camera pose at time t.
When the camera pose changes at a steady pace over time, in other words, when the moving body 1 moves to cause the camera pose to change at a steady pace over time, computation of the camera pose of the camera 2c at time t based on a difference between the camera pose of the camera 2a at time t−2Δt and the camera pose of the camera 2b at time t−Δt (see
In reality, when the camera pose shakes or the moving body 1 accelerates or rotates, the camera pose may not change at a steady pace over time. In such a case, as shown in
However, when the moving body 1 significantly accelerates or rotates between time t−Δt and time t, the projection point P1 is projected to a place far away from the feature point Fc1, causing the feature point Fc1 to be located outside the search range D. This prevents the feature point Fc1 that should correspond to the projection point P1 from corresponding to the projection point P1, and the feature point matching fails accordingly.
Therefore, according to the present embodiment, the acceleration and/or angular velocity measured by the IMU 3 shown in
First, the controller 4 acquires, from the IMU 3, the acceleration and angular velocity of the moving body 1 between time t−Δt and time t (S311). Next, the controller 4 computes the amount of change in the camera pose between time t−Δt and time t by integrating both the acceleration and the angular velocity with respect to time (S312).
Next, the controller 4 acquires the camera pose computed at time t−Δt (S313). The camera pose acquired in step S313 is the same as the camera pose computed by the controller 4 in a step corresponding to step S40 (see
Next, the controller 4 computes the predicted camera pose at time t based on the camera pose at time t−Δt acquired in step S313 and the amount of change in the camera pose between time t−Δt and time t computed in step S312 (S314).
According to the present embodiment, the acceleration and/or angular velocity measured by the IMU 3 is reflected in the prediction of the camera pose to allow the feature point matching to be efficiently performed even when the moving body 1 accelerates or rotates.
As described above, the positioning device 100 according to the present embodiment determines the position of the moving body 1. The positioning device 100 includes the camera 2 that is mounted on the moving body 1 and captures an image of surroundings of the moving body 1 to acquire the captured image, the IMU 3 that detects motion information such as acceleration and angular velocity indicating motion of the moving body 1, the controller 4 that extracts feature points from the captured images 10a, 10b, and the storage 5 that stores the map points M1, M2 each indicating a spatial position of a corresponding feature point in the surroundings. The controller 4 searches the captured image 10c for an on-image position corresponding to the spatial position indicated by each of the map points M1, M2 (S30) and computes a positional relationship between the spatial position indicated by each of the map points M1, M2 and the camera 2 to obtain the position of the moving body 1 in the surroundings (S40). The controller 4 sets, based on the motion information detected by the IMU 3, a reference point P1 for use in searching the captured image 10c for the spatial position.
The positioning device 100 according to the present embodiment can efficiently perform, even when the moving body 1 accelerates or rotates, the feature point matching by searching the captured image 10c for the spatial position based on the motion information detected by the IMU 3.
According to the present embodiment, the IMU 3 may detect motion information between the first time t−Δt and the second time t lining up in time which the camera 2 moves. The controller 4 can predict the camera pose at the second time t from the camera pose at the first time t−Δt based on the motion information (S31).
According to the present embodiment, the IMU 3 includes at least one of an inertial measurement unit, an accelerometer, or a gyroscope.
According to the present embodiment, the captured image includes a distance image and a color image.
A description will be given, according to the second embodiment, of an example where the search range D is changed in size based on the result of measurement made by the IMU 3.
The controller 4 of the positioning device 200 computes the position of the moving body 1 by executing steps S10 to S60 as shown in
In step S34b, the controller 4 serving as a feature point matching unit 242 specifies the search range D based on the acquired result of measurement made by the IMU 3 such as the angular velocity. In other words, the controller 4 changes the size of the search range D based on the result of measurement made by the IMU 3 such as the value of the angular velocity.
The captured image 10 captured by the camera 2 is in an image plane. Each point in the captured image 10 is represented by u and v coordinates, orthogonal to each other, in an image coordinate system.
The position of the map point M in the 3D map 51 may be represented by the camera coordinate system or by the world coordinates X, Y and Z. The map point M is projected onto the captured image 10 in step S33 shown in
For example, when both the acceleration and angular velocity detected by the IMU 3 are zero, the controller 4 sets a rectangle having a length u0 in the u direction and a length v0 in the v direction centered around the projection point P as the search range D in the acquisition step S34b. u0 and v0 denotes initial values of the lengths of the predetermined search range D in the u and v directions. The lengths in the u and v directions are represented by, for example, the number of pixels.
Further, for example, when a determination is made based on the angular velocity detected by the IMU 3 that the camera 2 has rotated (yawed) about the y-axis, the controller 4 sets the length of the search range D in the u direction to u1 that is greater than u0. For example, the larger the angular velocity about the y-axis, the larger the difference between u1 and u0.
Likewise, for example, when a determination is made based on the angular velocity detected by the IMU 3 that the camera 2 has rotated (pitched) about the x-axis, the controller 4 sets the length of the search range D in the v direction to v1 that is greater than v0. For example, the larger the angular velocity about the x-axis, the larger the difference between v1 and v0.
Likewise, for example, when a determination is made based on the angular velocity detected by the IMU 3 that the camera 2 has rotated (rolled) about the z-axis, the controller 4 rotates the search range D in the rolling direction. For example, the larger the angular velocity about the z-axis, the larger the rotation angle.
When the controller 4 detects vibrations based on the acceleration and/or angular velocity detected by the IMU 3, the search range D may be made larger than the initial value (u0*v0). For example, the controller 4 determines that, when acceleration ay in the y-axis direction has fluctuated between positive and negative a predetermined threshold number of times or more between time t−Δt and t as shown in
The size of the search range D is determined based on, for example, how large the absolute value of the acceleration ay between time t−Δt and t is. In the example shown in
Likewise, for example, the controller 4 determines that, when acceleration ax in the x-axis direction has fluctuated between positive and negative the predetermined threshold number of times or more between time t−Δt and t, the IMU 3 and in turn the moving body 1 has vibrated in the x-axis direction. When a determination is made that the moving body 1 has vibrated in the x-axis direction, the controller 4 sets, for example, the length of the search range D in the u direction to u1 that is greater than the initial value u0.
As described above, according to the present embodiment, the controller 4 changes the search range D for use in searching the captured image 10c for the spatial position based on the motion information detected by the IMU 3 (S34b). This can prevent a situation where feature points in the current frame (captured image at time t) to be associated with feature points in the previous frame (captured image at time t−Δt) fall outside the search range D due to a change in the camera pose caused by the rotation or acceleration of the moving body 1. This in turn increases the efficiency of the feature point matching and the accuracy of computation of the position of the moving body 1.
A description will be given, according to the third embodiment of the present disclosure, of an example of determining a region where no feature point search is made based on the result of measurement made by the IMU 3.
According to the third embodiment, the controller 4 computes the position of the moving body 1 by executing steps S10 to S60 as shown in
In step S34c, the controller 4 specifies the search range D based on the acquired result of measurement made by the IMU 3 such as the angular velocity.
A region S in the captured image 310b at time t shown in
The controller 4 determines the position and size of the new region S in the captured image based on the acquired result of measurement made by the IMU 3. For example, the controller 4 acquires the angular velocity detected by the IMU 3 between time t−Δt and time t and integrates the angular velocity thus acquired to compute a rotation angle φ of the camera 2 between time t−Δt and time t. The controller 4 computes the position and size of the new region S based on the rotation angle φ thus computed, the rotation direction, and an internal parameter of the camera 2.
For example, assuming that the camera 2 rotates about the y-axis by a rotation angle φu between time t−Δt and time t, a length us [pixel] of the new region S in the u direction shown in
u
s
=U*φ
u/θu (1)
where U [pixel] represents the total length of the captured image 310b in the u direction, and θu represents the angle of view of the camera 2 in the u direction.
Likewise, assuming that the camera 2 rotates about the x-axis by a rotation angle φv between time t−Δt and time t, a length vs [pixel] of the new region S in the v direction can be computed by the following equation (2):
v
s
=V*φ
v/θv (2)
where V [pixel] represents the total length of the captured image 310b in the v direction, and θv represents the angle of view of the camera 2 in the v direction.
As described above, according to the present embodiment, the controller 4 restricts the search range D based on the angle of view of the captured image 10 captured by the camera 2.
That is, the controller 4 computes the position and size of the new region S in the captured image based on the acquired result of measurement made by the IMU 3 and excludes the feature points in the, new region S from the feature point matching target. This eliminates the need for associating the feature points in the new region S in the current frame (captured image at time t) 310b with the feature points in the previous frame (captured image at time t−Δt) 310a, which increases the efficiency of the feature point matching and in turn the accuracy of computation of the position of the moving body 1. Further, this makes the number of feature points, i.e. the feature point matching target, smaller in the current frame (captured image at time t) 310b, allowing a reduction in computational load on the controller 4.
As described above, the first to third embodiments have been described as examples of the technique disclosed in the present application. However, the technique according to the present disclosure is not limited to the embodiments and is applicable to embodiments in which changes, replacements, additions, omissions, or the like are made as appropriate. Further, it is also possible to combine the respective components described in the first to third embodiments to form a new embodiment.
The present disclosure is applicable to a positioning device that determines a position of a moving body.
Number | Date | Country | Kind |
---|---|---|---|
2018-247816 | Dec 2018 | JP | national |
The present application is a continuation of PCT/JP2019/046198 filed on Nov. 26, 2019, which claims priority to Japanese Patent Application No. 2018-247816, filed on Dec. 28, 2018, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2019/046198 | Nov 2019 | US |
Child | 17357173 | US |