The present disclosure relates to a map creation device, a map creation method, and a map creation program.
Conventionally, there is a technique in which a surrounding environment is photographed with a camera mounted on a vehicle and the position and orientation of the vehicle are estimated from photographed images (e.g., Steven Lovegrove, Andrew J. Davison and Javier Ibanez-Guzman, “Accurate Visual Odometry from a Rear Parking Camera”, Intelligent Vehicles Symposium, 2011; hereinafter, referred to as “Non Patent Literature 1”.).
According to the technique described in Non Patent Literature 1, as shown in
However, in the technique described in the above-described Non Patent Literature 1, the amount of iterative computation performed upon determining the optimal value of the homography matrix is large and thus a reduction in cost cannot be achieved. Hence, a reduction in the amount of iterative computation is sought.
The present disclosure is made in view of the above-described circumstances, and provides a map creation device, a map creation method, and a map creation program that can reduce the amount of iterative computation when calculating three-dimensional positions of features in a plurality of images obtained by photographing different locations.
To provide the above-described map creation device, a map creation device according to a first aspect includes: an image obtaining part that obtains a plurality of images from an in-vehicle camera that is mounted on a vehicle and photographs a surrounding of the vehicle, the plurality of images being obtained by photographing different locations; an odometry information calculating part that calculates odometry information indicating an amount of movement of the vehicle; an initial value calculating part that calculates an initial value of a homography matrix between the plurality of images from the odometry information of the vehicle; an optimal value calculating part that calculates, by iterative computation, an optimal value of the homography matrix from the initial value and a luminance value of each pixel included in a road-surface region specified in the plurality of images; a camera position and orientation calculating part that calculates an amount of change in camera position and an amount of change in camera orientation of the in-vehicle camera by resolving the optimal value; and a three-dimensional position calculating part that calculates three-dimensional positions of features in the plurality of images from the amount of change in camera position and the amount of change in camera orientation.
In addition, a map creation device according to a second aspect is such that in the map creation device according to the first aspect, the camera position and orientation calculating part further calculates an estimated value of a road surface's normal vector by resolving the optimal value, the road surface's normal vector being a vector in a normal direction of a road surface viewed from the in-vehicle camera, and the map creation device further includes a use determining part that determines to use the amount of change in camera position and the amount of change in camera orientation, when an error is less than a threshold value, the error being represented by an angle between the estimated value of the road surface's normal vector and a value of a road surface's normal vector that is determined in advance by calibration of the in-vehicle camera.
In addition, a map creation device according to a third aspect is such that in the map creation device according to the second aspect, the use determining part determines not to use the amount of change in camera position and the amount of change in camera orientation, when the error is greater than or equal to the threshold value.
Furthermore, to provide the above-described map creation method, a map creation method according to a fourth aspect includes: obtaining a plurality of images from an in-vehicle camera that is mounted on a vehicle and photographs a surrounding of the vehicle, the plurality of images being obtained by photographing different locations; calculating odometry information indicating an amount of movement of the vehicle; calculating an initial value of a homography matrix between the plurality of images from the odometry information of the vehicle; calculating, by iterative computation, an optimal value of the homography matrix from the initial value and a luminance value of each pixel included in a road-surface region specified in the plurality of images; calculating an amount of change in camera position and an amount of change in camera orientation of the in-vehicle camera by resolving the optimal value: and calculating three-dimensional positions of features in the plurality of images from the amount of change in camera position and the amount of change in camera orientation.
Furthermore, to provide the above-described map creation program, a map creation program according to a fifth aspect causes a computer to: obtain a plurality of images from an in-vehicle camera that is mounted on a vehicle and photographs a surrounding of the vehicle, the plurality of images being obtained by photographing different locations; calculate odometry information indicating an amount of movement of the vehicle; calculate an initial value of a homography matrix between the plurality of images from the odometry information of the vehicle; calculate, by iterative computation, an optimal value of the homography matrix from the initial value and a luminance value of each pixel included in a road-surface region specified in the plurality of images; calculate an amount of change in camera position and an amount of change in camera orientation of the in-vehicle camera by resolving the optimal value; and calculate three-dimensional positions of features in the plurality of images from the amount of change in camera position and the amount of change in camera orientation.
According to a technique of the present disclosure, there is an advantageous effect that when three-dimensional positions of features in a plurality of images obtained by photographing different locations are calculated, the amount of iterative computation can be reduced.
An example of a mode for carrying out a technique of the present disclosure will be described in detail below with reference to the drawings. Note that components and processes that have the same operation, action, or functions are given the same reference signs throughout all drawings, and an overlapping description thereof may be omitted as appropriate. Each drawing is merely schematically shown to the extent that the technique of the present disclosure can be adequately understood. Thus, the technique of the present disclosure is not limited only to examples shown in the drawings. In addition, in the present embodiment, a description of configurations that are not directly related to the technique of the present disclosure or well-known configurations may be omitted.
A map creation device according to the present embodiment relates to initialization of a map performed when a point cloud map is created in a framework of a Visual Simultaneous Localization and Mapping (SLAM) technique that uses an in-vehicle camera. In the initialization of a map, first, there is a need to determine the amount of change in position and the amount of change in orientation of the in-vehicle camera from images obtained by photographing two different locations. However, in a scene with a very few features that can be detected from images, it is difficult to accurately determine the amount of change in position and the amount of change in orientation of the in-vehicle camera from the features. Hence, the amount of change in position and the amount of change in orientation of the in-vehicle camera are determined from the luminance value of each pixel in a road-surface region.
As shown in
The in-vehicle camera 22 is mounted on a vehicle and photographs surroundings of the vehicle. Where to install the in-vehicle camera 22 on the vehicle is not particularly limited as long as the in-vehicle camera 22 is installed to allow photographing of a road surface. For the in-vehicle camera 22, for example, a monocular camera is applied, but the in-vehicle camera 22 is not limited thereto and a stereo camera, etc., may be used.
Specifically, the in-vehicle camera 22 is a monocular camera provided at the top of the vehicle, etc., and photographs surrounding areas such as in front of or behind the vehicle. The in-vehicle camera 22 is, for example, provided near a substantially central portion in a vehicle width direction and disposed such that an optical axis of the in-vehicle camera 22 faces slightly downward relative to a horizontal direction. The in-vehicle camera 22 is communicably connected to the map creation device 10, and sends photographed images to the map creation device 10.
The wheel speed sensor 20 detects wheel speed of four wheels of the vehicle. The wheel speed sensor 20 sends the detected wheel speed to the map creation device 10. For the wheel speed sensor 20, normally, a wheel encoder is used, but for a vehicle including a motor such as a hybrid vehicle, a motor encoder may be used. The motor encoder has high detection accuracy compared to the wheel encoder and thus is desirable.
The steering angle sensor 21 detects a steering angle of the vehicle. The steering angle sensor 21 sends the detected steering angle to the map creation device 10.
The map creation device 10 according to the present embodiment calculates an initial value of a homography matrix between a plurality of images from odometry information of the vehicle, and calculates an optimal value of the homography matrix using the calculated initial value. By this, the amount of iterative computation can be reduced. The map creation device 10 according to the present embodiment is assumed for in-vehicle use. In a case of in-vehicle use, since the processing capabilities of resources such as a processor and a memory are relatively low, by reducing the amount of iterative computation, load on the resources decreases and thus a greater advantageous effect can be obtained.
Specifically, the map creation device 10 may be implemented as a part of an Electronic Control Unit (ECU) which is a computer for vehicle control, or may be implemented as an in-vehicle computer different from the ECU.
The map creation device 10 includes a Central Processing Unit (CPU) 11, a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, an input/output interface (I/O) 14, a storage part 15, and an external interface (external I/F) 16.
The CPU 11, the ROM 12, the RAM 13, and the I/O 14 are connected to each other through a bus. Functional parts including the storage part 15 and the external I/F 16 are connected to the I/O 14. Each functional part can mutually communicate with the CPU 11 through the I/O 14.
The CPU 11, the ROM 12, the RAM 13, and the I/O 14 form a control part. The control part may be formed as a sub-control part that controls a part of operation of the map creation device 10, or may be formed as a part of a main control part that controls the overall operation of the map creation device 10. For some or all of blocks of the control part, for example, an integrated circuit such as Large Scale Integration (LSI) or an Integrated Circuit (IC) chipset is used. For the above-described blocks, individual circuits may be used or a partially or fully integrated circuit may be used. All blocks may be provided in a one-piece design or some blocks may be provided separately. In addition, for each block, a part of the block may be provided separately. Integration of the control part is not limited to LSI, and a dedicated circuit or a general-purpose processor may be used.
For the storage part 15, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), a flash memory, etc., are used. The storage part 15 stores therein a map creation program 15A according to the present embodiment. Note that the map creation program 15A may be stored in the ROM 12.
The map creation program 15A may be, for example, installed in advance on the map creation device 10. The map creation program 15A may be implemented such that the map creation program 15A is stored in a non-volatile storage medium or distributed through a network, and installed on the map creation device 10 as appropriate. Note that possible examples of the non-volatile storage medium include a Compact Disc Read Only Memory (CD-ROM), a magneto-optical disk, an HDD, a Digital Versatile Disc Read Only Memory (DVD-ROM), a flash memory, and a memory card.
The external I/F 16 is an interface for communicably establishing connection with each of the wheel speed sensor 20, the steering angle sensor 21, and the in-vehicle camera 22.
The CPU 11 of the map creation device 10 according to the present embodiment functions as each part shown in
As shown in
The image obtaining part 11A obtains a plurality of images from the in-vehicle camera 22. The image obtaining part 11A sends the obtained plurality of images to the optimal value calculating part 11E. The plurality of images are, for example, images obtained by photographing two different locations.
The sensor information obtaining part 11B obtains wheel speed detected by the wheel speed sensor 20 and a steering angle detected by the steering angle sensor 21. The sensor information obtaining part 11B sends the obtained wheel speed and steering angle to the odometry information calculating part 11C.
The odometry information calculating part 11C calculates odometry information indicating the amount of movement of the vehicle, based on the wheel speed and steering angle sent from the sensor information obtaining part 11B. Specifically, the odometry information calculating part 11C calculates a distance traveled by the vehicle, based on the wheel speed and calculates a turning radius of the vehicle, based on the steering angle.
As shown in
The initial value calculating part 11D calculates an initial value of a homography matrix between the plurality of images, based on the odometry information of the vehicle sent from the odometry information calculating part 11C. Note that a homography refers to projection of a plane to another plane using a projective transformation. Specifically, the initial value calculating part 11D computes a translation vector tc representing a change in the position of the in-vehicle camera 22, from the amount of change in position (ΔXv, ΔYv), and computes a rotation matrix Rc representing a change in the orientation of the in-vehicle camera 22, from the amount of change in yaw angle Δθv.
As shown in
The initial value calculating part 11D calculates an initial value G0 of a homography matrix using the following equation (1).
Note that K represents an internal parameter matrix of the in-vehicle camera 22, and Rc represents the rotation matrix. tc represents the translation vector, nT represents a transposed matrix of the value n of the road surface's normal vector, and h represents the installation height of the in-vehicle camera 22 from the road surface. Note that for the internal parameter matrix K, the road surface's normal vector n, and the installation height h, values that are determined in advance by calibration of the in-vehicle camera 22 are used.
The initial value calculating part 11D sends the initial value G0 calculated using the above-described equation (1) to the optimal value calculating part 11E.
The optimal value calculating part 11E calculates, by iterative computation, an optimal value of the homography matrix from the initial value sent from the initial value calculating part 11D and the luminance value of each pixel included in a road-surface region specified in the plurality of images sent from the image obtaining part 11A. Specifically, as an example, as shown in
In the image obtained before movement of the vehicle shown in
According to the example of
The camera position and orientation calculating part 11F calculates the amount of change in camera position and the amount of change in camera orientation of the in-vehicle camera 22 by resolving the optimal value sent from the optimal value calculating part 11E. The amount of change in camera position is represented as an estimated value test of the translation vector, and the amount of change in camera orientation is represented as an estimated value Rest of the rotation matrix. In addition, the camera position and orientation calculating part 11F calculates an estimated value of the road surface's normal vector by resolving the optimal value sent from the optimal value calculating part 11E. Specifically, the optimal value GOPT is resolved as shown in the following equation (2).
Note that K represents the internal parameter matrix of the in-vehicle camera 22, and Rest represents the rotation matrix (estimated value) representing the amount of change in camera orientation. test represents the translation vector (estimated value) representing the amount of change in camera position, nestT represents the transposed matrix of the estimated value nest of the road surface's normal vector, and h represents the installation height of the in-vehicle camera 22 from the road surface.
The use determining part 11G determines to use the amount of change in camera position and the amount of change in camera orientation in a subsequent process (i.e., a process of calculating three-dimensional positions), when an error which is represented by an angle between the estimated value nest of the road surface's normal vector and a value n of a road surface's normal vector which is determined in advance by calibration of the in-vehicle camera 22 is less than a threshold value. On the other hand, when the error is greater than or equal to the threshold value, the use determining part 11G determines not to use the amount of change in camera position and the amount of change in camera orientation in the subsequent process. When the amount of change in camera position and the amount of change in camera orientation are not used, an optimal value of a homography matrix is determined again from a different set of images obtained by photographing two different locations. The threshold value can be set to an appropriate value in a range greater than 0 degrees and less than or equal to 5 degrees. Note that the use determining part 11G is not essential and a configuration that does not include the use determining part 11G may be adopted. In this case, the amount of change in camera position and the amount of change in camera orientation that are calculated by the camera position and orientation calculating part 11F are used as they are in the subsequent process.
The three-dimensional position calculating part 11H calculates three-dimensional positions of features in the plurality of images from the amount of change in camera position and the amount of change in camera orientation, when the use determining part 11G determines to use the amount of change in camera position and the amount of change in camera orientation. Specifically, the three-dimensional position calculating part 11H computes, using the principle of triangulation, three-dimensional positions of corresponding features between the images of two different locations from the positions of the features, the amount of change in camera position, and the amount of change in camera orientation.
Next, with reference to
First, when the map creation device 10 accepts an instruction to start a map creation process, the CPU 11 starts the map creation program 15A, thereby performing the following steps.
At step S101 of
At step S102, the CPU 11 obtains, as sensor information, each of wheel speed from the wheel speed sensor 20 and a steering angle from the steering angle sensor 21.
At step S103, as an example, as shown in the above-described
At step S104, the CPU 11 calculates an initial value of a homography matrix between the plurality of images, based on the odometry information of the vehicle calculated at step S103. Specifically, as an example, as shown in the above-described
At step S105, as an example, as shown in the above-described
At step S111 of
Specifically, as shown in
Note that
JI*ui represents the luminance gradient in a horizontal direction of an ith pixel, and JI*vi represents the luminance gradient in a vertical direction of the ith pixel.
The Jacobian matrix Jw is computed from the coordinates of each pixel in the tracking region in the image I* obtained before movement, using the following equation.
Note that the coordinates of each pixel in the tracking region are represented by
In this case,
The Jacobian matrix JG is computed from a basis Ai (i=1 to 8) of a Lie algebra, using the following equation.
In the equation, [Ai]v is a vector of 9 rows by 1 column rearranged row by row.
At step S112, the CPU 11 substitutes the initial value G0 for an estimated value G{circumflex over ( )} ({circumflex over ( )} is above G; hereinafter, the same.) of the homography matrix, and substitutes 1 for the number of iterations nite.
At step S113, the CPU 11 calculates a luminance gradient matrix JI for a tracking region in an image I obtained after movement.
Specifically, coordinates in the image I obtained after movement are computed using the following equation.
Note that the coordinates in the image I obtained after movement are represented by
The luminance gradient matrix JI is computed from the luminance of each pixel in the tracking region in the image I obtained after movement, using the following equation.
Note that
JIui represents the luminance gradient in a horizontal direction of an ith pixel, and JIvi represents the luminance gradient in a vertical direction of the ith pixel.
At step S114, the CPU 11 calculates a parameter x (a vector of 8 rows by 1 column) of the homography matrix. Specifically, the parameter x is computed using the following equation.
In the equation, Jesm is a Jacobian matrix and is computed using the following equation.
On the other hand, y is a luminance difference vector and is represented by the following equation.
In the equation, yi is computed from the luminance Ii of the ith pixel obtained after movement and the luminance Ii* of the ith pixel obtained before movement, using the following equation.
At step S115, the CPU 11 updates the estimated value G{circumflex over ( )} of the homography matrix, using the following equation.
The above-described G is used as a new G{circumflex over ( )}.
At step S116, the CPU 11 determines whether or not a termination condition is satisfied, i.e., whether or not iteration is required. If it is determined that the termination condition is satisfied, i.e., iteration is not required (in a case of an affirmative determination), then the CPU 11 transitions to step S117. If it is determined that the termination condition is not satisfied, i.e., iteration is required (in a case of a negative determination), then the CPU 11 returns to step S113 and repeats processing.
Specifically, when the root mean square of a luminance difference obtained this time is ycurr, ycurr is represented by the following equation.
The maximum number of iterations is nmax (e.g., 100) and a threshold value for convergence determination is ε (e.g., 10−5).
In a case of nite=1, the root mean square ycurr of the luminance difference obtained this time is substituted for the root mean square yprev of a luminance difference obtained last time, and 1 is added to the number of iterations nite, and processing returns to step S113.
In a case of 1<nite<nmax, if yprev−ycurr>ε, then it is determined that it is not converged, and thus, the root mean square ycurr of the luminance difference obtained this time is substituted for the root mean square yprev of the luminance difference obtained last time, and 1 is added to the number of iterations nite, and processing returns to step S113. On the other hand, if yprev−ycurr≤ε, then it is determined that it is converged, and thus, processing transitions to step S117.
In a case of nite=nmax, processing transitions to step S117.
At step S117, the CPU 11 adopts the estimated value G{circumflex over ( )} of the homography matrix as an optimal value GOPT, and returns to step S106 of
Referring back to
At step S107, the CPU 11 determines whether or not the amount of change in camera position and the amount of change in camera orientation can be used. If it is determined that the amount of change in camera position and the amount of change in camera orientation can be used (in a case of an affirmative determination), then the CPU 11 transitions to step S108. If it is determined that the amount of change in camera position and the amount of change in camera orientation cannot be used (in a case of a negative determination), then the CPU 11 returns to step S101 and repeats processing. Specifically, when an error which is represented by an angle between the estimated value nest of the road surface's normal vector and a value n of a road surface's normal vector which is determined in advance by calibration of the in-vehicle camera 22 is less than a threshold value, the CPU 11 determines to use the amount of change in camera position and the amount of change in camera orientation in a subsequent process. On the other hand, when the error is greater than or equal to the threshold value, the CPU 11 determines not to use the amount of change in camera position and the amount of change in camera orientation in the subsequent process. When the amount of change in camera position and the amount of change in camera orientation are not used, an optimal value of a homography matrix is determined again from a different set of images obtained by photographing two different locations.
At step S108, the CPU 11 calculates three-dimensional positions of features in the plurality of images from the amount of change in camera position and the amount of change in camera orientation, and terminates a series of processes performed by the map creation program 15A. Specifically, the CPU 11 computes, using the principle of triangulation, three-dimensional positions of corresponding features between the images of two different locations from the positions of the features, the amount of change in camera position, and the amount of change in camera orientation.
Thus, according to the present embodiment, by calculating an initial value of a homography matrix from odometry information of the vehicle, iterative computation can be started with a value close to an optimal value of the homography matrix. Hence, the number of iterations of iterative computation is reduced.
In addition, when the texture (pattern) of a road surface is poor, an error in a homography matrix increases, and thus, an error in a normal line to the road surface determined from the homography matrix may also increase. The error in the normal line to the road surface is determined by comparison with a normal line to the road surface determined from camera installation orientation. When the error in the normal line to the road surface is large, it is determined that the determined amount of change in camera position and amount of change in camera orientation cannot be used, and a homography matrix can be determined again from a different set of images obtained by photographing two locations.
Note that, in the above-described embodiments, the processor indicates a processor in a broad sense and includes general-purpose processors (e.g., a CPU: Central Processing Unit) and dedicated processors (e.g., a GPU: Graphics Processing Unit, an ASIC: Application Specific Integrated Circuit, an FPGA: Field Programmable Gate Array, and a programmable logic device).
In addition, the operation of a processor in the above-described embodiments may not only be performed by one processor but also by a plurality of processors present at physically distant locations in a cooperative manner. In addition, the order in which various types of operation of the processor are performed is not limited to the one described in the above-described embodiments and may be changed as appropriate.
A map creation device according to the embodiment is exemplified and described above. The embodiment may be in the form of programs for causing a computer to perform a function of each part included in the map creation device. The embodiment may be in the form of a non-transient storage medium that can be read by a computer having the programs stored therein.
In addition to the above, the configuration of the map creation device described in the above-described embodiment is an example and may be changed according to the situation, without departing from the true spirit.
In addition, the flow of a process performed by the program which is described in the above-described embodiment is also an example, and an unnecessary step may be deleted or a new step may be added or the processing order may be changed without departing from the true spirit.
In addition, although the above-described embodiment describes a case in which by executing the program, processes according to the embodiment are implemented by a software configuration using a computer, the embodiment is not limited thereto. The embodiment may be implemented, for example, by a hardware configuration or a combination of a hardware configuration and a software configuration.
The disclosure of Japanese Patent Application No. 2021-178338 filed Oct. 29, 2021 is incorporated herein by reference in its entirety. All literatures, patent applications, and technical standards described herein are incorporated herein by reference to the same extent as if each individual literature, patent application, or technical standard was specifically and individually indicated to be incorporated by reference.
Number | Date | Country | Kind |
---|---|---|---|
2021-178338 | Oct 2021 | JP | national |
This is a National Stage of International Application No. PCT/JP2022/040838 filed Oct. 31, 2022, claiming priority based on Japanese Patent Application No. 2021-178338 filed Oct. 29, 2021, the entire contents of which are incorporated in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/040838 | 10/31/2022 | WO |