The present invention relates to robot positioning technology, and specifically to a map-based consistent and efficient filtering algorithm for visual inertial positioning.
Positioning technology is an essential basic module for intelligent robots. In recent years, cameras and inertial measurement units (IMU) have been widely used in positioning technology, spawning a large number of excellent visual inertial odometry (VIO) algorithms. However, the positioning accuracy of VIO will inevitably drift as the run time of the algorithm elapses. This requires the introduction of a pre-built map to eliminate a cumulative error. When the positioning algorithm fuses pre-built map information, the system needs to ensure consistent and effective fusion of the map information in order to effectively use the map information to eliminate the accumulated error of the odometry.
Frameworks of current positioning algorithms are divided into two types. One type is to initialize the relative transformation between the local odometry reference system and the pre-built map reference system, and then perform real-time positioning in the map reference system. The other type is to simultaneously maintain the local odometry and the relative transformation between the local odometry reference system and the pre-built map reference system, thereby indirectly obtaining the position in the map reference system. The limitation of the first type of algorithm is that for visual inertial positioning, the pre-built map needs to be in the inertial system, which limits the application of the positioning algorithm. For the second type of algorithm, although there is no limitation of the first type of algorithm, potential problems in this type of framework may destroy the observability of the system and make the originally unobservable information observable, thereby introducing false observable information, which reduces the positioning accuracy and even causes the algorithm to diverge.
In addition, due to limitations of computational efficiency or algorithm framework, a vast majority of algorithms do not consider the uncertainty of the map and believe that the map is absolutely accurate. This will make the positioning algorithms rely too much on the inaccurate map, greatly reducing the positioning effect. Even though some algorithms consider the uncertainty of the map in the above framework, the algorithms destroy the observability of the system and make the originally unobservable information observable, thereby introducing false observable information, which reduces the positioning accuracy and even causes the algorithm to diverge.
How to ensure the correct observability of the system while considering the uncertainty of the map, thereby improving the accuracy and reliability of the map-based positioning algorithm is a problem that needs to be solved. In addition, it is further necessary to ensure that the computational load of the system is kept at a low level.
To overcome the shortcomings of the existing technology, the present invention proposes a filtering-based positioning algorithm and a visual inertial positioning technology based on a pre-built visual map. The proposed visual inertial positioning technology is implemented based on a consistent and efficient filtering algorithm. The present invention is implemented through the following technical solution:
The present invention discloses a map-based consistent and efficient filtering algorithm for visual inertial positioning, where the algorithm is implemented through the following system, and the system includes three modules: a local odometry module, a map feature matching module, and a map-based positioning module, where the local odometry module is configured to receive data of a camera and an IMU, obtain a state of the system in a local reference system in real time, and obtain values of corresponding state variables and covariance of the values; the map feature matching module is configured to detect a similarity between a scene observed by the camera at a current moment and a pre-built map scene, and obtain a feature matching pair of an image feature at the current moment and a map feature; and the map-based positioning module is configured to receive an output amount of the local odometry and the feature matching pair, obtain an updated state of the local odometry and a relative transformation (augmentation variable) between the local reference system and a map reference system, and then calculate a state of a robot in the map reference system.
As a further improvement, in the present invention, the local odometry module includes the IMU, a state propagation module in signal connection to the IMU, the camera, a feature tracking module in signal connection to the camera, and a Multi-State Observability Constraint-Schmidt-Invariant Kalman Filter (MSOC-S-IKF) state update module based on local feature observation in signal connection to the state propagation module and the feature tracking module.
As a further improvement, in the present invention, the IMU is configured to provide real-time rotation angular velocity and linear acceleration for the system; the state propagation module is configured to receive the rotation angular velocity and the linear acceleration that are provided by the IMU, propagate the state of the system from a previous moment to the current moment by using the quantities, and obtain state variables predicted at the current moment and a covariance; and transmit signals of the obtained state variables and covariance to the MSOC-S-IKF state update module based on local feature observation; the feature tracking module is configured to track positions of feature points in an image at the previous moment in an image at the current moment, obtain tracked feature points in the image at the current moment, and transmit signals of the obtained feature points to the MSOC-S-IKF state update module based on local feature observation; and the MSOC-S-IKF state update module based on local feature observation is configured to calculate, through inputted feature point information combined with the inputted predicted state variables and covariance, an observation error through a reprojection error, and update the state variables and the covariance using an invariant Kalman filter (IKF) combined with a multi-state constraint (MS) proposed by the invention.
As a further improvement, in the present invention, the state variables and the covariance corresponding to the state variables are required to include the following variables:
As a further improvement, in the present invention, the state variables 1), 2), and 5) are represented together on a new Lie group space SE2+KM(3), the state variables 4) and 6) are represented in a Lie group space SE(3), the state variable 3) is represented in a Euclidean vector space, and a specific definition of SE2+KM(3) is:
As a further improvement, in the present invention, when the state propagation module and the MSOC-S-IKF state update module based on local feature observation solve a Jacobian matrix of a motion equation or an observation equation with respect to a state variable error, the state error used is defined as above.
As a further improvement, in the present invention, the MSOC-S-IKF state update module based on local feature observation completes state update through the following equations:
As a further improvement, in the present invention, the map feature matching module is configured to obtain, through information about the current camera and a map, map feature points obtained by the current camera through matching and a map key frame in which the map feature points are observable, namely, a feature pair and a map key frame that match, where signals of the feature pair and the map key frame that match are transmitted to the map-based positioning module.
As a further improvement, in the present invention, the map-based positioning module includes the following modules:
As a further improvement, in the present invention, the module for determining whether there is a feature matching pair is configured to determine whether the map feature matching module has an output, where if there is an output, indicating that there is a feature matching pair between the current frame and a map frame, a subsequent map-based positioning-related process is performed; or if there is no output, indicating that there is no feature matching pair between the current frame and a map frame, an odometry state and an augmentation variable are outputted directly (if the augmentation variable has been initialized) without performing MSOC-S-IKF state update based on map feature observation; the module for determining whether an augmentation variable is initialized is configured to determine whether the system is required to initialize the augmentation variable (namely, the relative pose between the local odometry reference system and the map reference system), where if the map feature matching module has outputted a feature pair and a map key frame that match and the augmentation variable has been initialized, the map key frame is added directly to the system state, and the feature pair and the map key frame that match are further used for the subsequent MSOC-S-IKF state update module based on map feature observation; or if the map feature matching module has outputted a feature pair and a map key frame that match, but the augmentation variable has not been initialized, the feature pair and the map key frame that match are used for the module for initializing an augmentation variable using PnP; the module for initializing an augmentation variable using PnP is configured to solve a relative pose between the map key frame and the current frame of the camera by receiving the feature pair and the map key frame that match outputted by the map feature matching module and using a PnP algorithm, and solve the relative pose (the augmentation variable) between the local odometry reference system and the map reference system by using the pose in the map key frame and a pose of the robot in the local odometry reference system at the current moment, to complete the initialization; the module for adding an augmentation variable to a system state is configured to add the initialized augmentation variable and covariance thereof to the system state, to maintain and update the augmentation variable in real time, so that the estimation accuracy gradually improves and becomes stable; the module for adding a pose in a map key frame to a system state is specifically as follows: when there is a feature matching pair between the map key frame and the current frame, and the augmentation variable has been initialized, the map key frame is directly added to the system state; or when there is a feature matching pair between the map key frame and the current frame, and the augmentation variable has not been initialized, the initialization is completed through the module for initializing an augmentation variable using PnP, and then a signal of the pose in the map key frame used for initialization is transmitted to the module for adding a pose in a map key frame to a system state.
As a further improvement, in the present invention, the MSOC-S-IKF state update module based on map feature observation includes three parts:
The beneficial effects of the present invention are as follows:
The technical solution of the present invention is further described below through the specific embodiments with reference to the accompanying drawings of the specification.
The entire system involves five types of coordinate systems, as shown in
Before introducing each module of the system, representation of state variables maintained by the system is first introduced. Different from conventional systems that represent state variables in Euclidean vector space, one of the innovation points of the present invention is to represent the state variables of the system in a new Lie group space SE2+KM(3). Lie group SE2+KM(3) is composed of Lie group SE2+K(3) and Lie group SO(3), and is specifically defined as:
Based on the definitions of the Lie group SE2+KM(3) and the corresponding Lie algebra (3), the present invention represents the state of the system on SE2+KM(3), specifically, on SE2+11(3) (it is assumed herein that only one local feature point is considered). A state of the system at the moment t is defined as Xt∈SE2+11(3)×:
With the definition (1) of the state of the system, it is further necessary to define an error corresponding to this state. Different from the conventional definition in Euclidean vector space, the state error of (1) is defined as:
In particular, when the uncertainty of the map needs to be considered, the relevant information of the map needs to be also added to the state of the system, so that the state of the system can be further expanded from equation (1) to:
The error of the map feature point is defined as:
The error of the state X*t of the entire system considering map uncertainty is expanded from equation (3) to:
Since the state of the system is defined in the Lie group SE2+KM(3) and has a special error definition (4), the state and variance propagation and update performed in this system state is called IKF, and a specific state propagation and update method will be introduced in the next section.
As shown in
Then, the error of XMS
The data of the IMU at the moment t is used for state propagation (prediction). Through the kinematic model of the IMU, the state prediction of the system at the moment t+1 can be obtained:
The image of the camera is used for the observation equation part. By tracking the feature points of the image stream, that is, the feature tracking module in
By performing first-order Taylor expansion on equation (8) in the current estimated state, the following can be obtained:
represents the Jacobian matrix of the observation equation (8) with respect to the state XMS
and òf are the “right invariant errors” of the relevant quantities.
Since a feature f can be observed by the camera corresponding to poses in XClone
By calculating the left null space N of Hf, the equation (11) can be transformed into:
The input is: the observation residual r (for equation (12), there is r=fr′MS), the Jacobian H of the observation equation with respect to the system state (for equation (12), there is H=H′MS
Update:
The output is: the updated system state at the moment t and the corresponding covariance Pt|t.
As shown in
This part of state update involves two innovation points, where one innovation point is the combination of S and IKF, to take the uncertainty of the map into account; and the other innovation point is the OC algorithm, which is used to ensure that the original observability of the system will not be destroyed when map-based state update is performed. The two innovation points are intended to maintain good consistency in the system and then improve the positioning accuracy of the map-based positioning algorithm. The states involved in this module are:
Through the map feature matching module, the feature matching pair of the map key frame and the current frame can be obtained, as shown by the black points of KF1 and KF2 and the white point of Ct in
If the feature matching pair between the map key frame and the current frame is obtained, and the augmentation variable LTG has not been initialized, the relative pose between the map frame and the current frame, that is, KF
By performing first-order Taylor expansion on equation (14) at the estimated value, the following observation equation can be obtained:
Since the feature points of the map are not added to the states, it is necessary to implicitly take the uncertainty of the map into account to ensure the consistency of the system. The specific method is: After the two observation equations in (15) are merged into one equation through a matrix stacking operation, the error term corresponding to the map feature point GpF
Map-based positioning needs to consider the pose of the map key frame in the state, and as the running time of the system increases, the number of map key frame pairs in the state gradually increases. Therefore, to ensure the computational efficiency of the filtering algorithm, the idea of S state update is incorporated when IKF state update is performed. Specifically, for the observation equation (16), the state update is performed according to the following steps:
When the Jacobian matrix of the equation (14)-(a) with respect to the state XS
Through the above algorithm design, the complexity of the algorithm can be linearly related to the quantity of map key frames. Compared with the quadratic relationship of the standard EKF, the complexity is greatly simplified. In addition, the amount of stored data of the map is also greatly reduced compared with the conventional method. The specific analysis is as follows:
It is assumed that the map has m key frames and n road marks (m<<n). First, a map data storage amount is analyzed. In a conventional method, to consider the uncertainty of the map, map feature points are added to a state, rather than adding map key frames to the state in the present invention. Therefore, the complexity of the amount of map data that needs to be stored is O(n+n2)=O(n2). However, according to the algorithm design of the present invention, it is only necessary to maintain the map key frame and its covariance in the state. Therefore, the complexity of the amount of map data stored with the algorithm of the present invention is O(m+m2+n)=O(m2+n). Compared with O(n2), the storage amount is greatly reduced.
Next, the calculation complexity is analyzed. Compared with the standard EKF/IKF, in the Schmidt update algorithm used, since the auto-covariance update part related to the map variable is 0 when the state covariance is updated, the complexity of the algorithm is reduced from O(m2) to O(m). The following table lists complexity cases of different algorithms:
The algorithm of the present invention is the last configuration in the table. It can be seen that the algorithm of the present invention has optimal storage complexity and calculation complexity.
To verify the effectiveness of the proposed MSOC-S-IKF, the pure odometry Open-VINS[1] is compared with the optimization-based algorithm VINS-Fusion[2][3], and an ablation experiment is conducted to verify the effectiveness of the innovation points proposed by the present invention. All the experiments are implemented based on a computer with Intel i7-10700@2.9 GHZ CPU and 16G RAM. MSC-S-EKF does not consider OC and IKF; MSC-EKF does not consider OC, S, and IKF; MSC-IKF does not consider OC and S. The experiment is verified on four datasets: EuRoC[4], corresponding to V102-MH05 in the following table; Kaist[5], corresponding to Urban39 in the following table; YQ[6], corresponding to YQ2-YQ4 in the following table; 4Seasons[7], corresponding to Office-Loop2 in the following table. In the table, “-” means that the algorithm cannot obtain this variable, and “” means that the algorithm cannot obtain the correct positioning result on the corresponding dataset.
From the above table, it can be seen that the proposed algorithm MSOC-S-IKF can run successfully on all datasets and has the best positioning results most of the time. Compared with the pure odometry Open-VINS, MSOC-S-IKF has significantly improved positioning accuracy due to its ability to correctly fuse map information, especially for the large scene datasets Kaist, YQ, and 4seasons. Compared with the optimization-based positioning algorithm VINS-Fusion, and the ablation algorithms MSC-EKF and MSC-IKF, because the algorithms do not correctly consider the uncertainty of the map, the positioning accuracy (especially on large scene datasets) is far inferior to that of MSOC-S-IKF. On some datasets, the compared algorithms even fail to run normally. Compared with MSC-S-IKF, because the algorithms fail to maintain the correct observability of the system, the positioning results (especially the positioning result of the local odometry) are very poor.
In addition, to verify the high efficiency of the algorithm of the present invention, calculation time consumption statuses of a plurality of algorithms are compared. There are four algorithms in total: Open-VINS, MSC-EKF, VINS-Fusion, and MSOC-S-IKF. The experimental results are shown in
Open-VINS is pure odometry with the lowest time complexity, but has a severe drift problem (refer to the above table). The MSC-EKF algorithm does not consider the uncertainty of the map and the observability of the system, and has a very poor algorithm positioning result although the time consumption is low. VINS-Fusion is an optimization-based algorithm, which does not consider the uncertainty of the map, and has high time consumption and low positioning accuracy. The algorithm MSOC-S-IKF of the present invention has good consistency and positioning accuracy, has algorithm time consumption significantly superior to that of the optimization-based algorithm, and has good real-time performance.
Obviously, the present invention is not limited to the above embodiments, and may have many modifications. All modifications that a person of ordinary skill in the art can directly derive or associate from the disclosure of the present invention should be considered to be within the protection scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202210458943.3 | Apr 2022 | CN | national |
The present application is a Continuation-In-Part Application of PCT Application No. PCT/CN2023/072749 filed on Jan. 17, 2023, which claims the benefit of Chinese Patent Application No. 202210458943.3 filed on Apr. 27, 2022. All the above are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/072749 | Jan 2023 | WO |
Child | 18785121 | US |