The disclosure belongs to the fields of computer vision and deep learning, and
particularly relates to a heterogeneous three-dimensional observation registration method based on depth phase correlation, medium, and device.
Heterogeneous observation registration is a crucial technology in vision and robotics, which is used to register two observation objects with differences in, for example, angle, scale, or viewing angle. Also, the observation may be, for example, an image, a point cloud, or a mesh model.
In the existing technology, the patent with application number CN202110540496.1 discloses a heterogeneous image pose estimation and registration method based on neural network, device, and medium. This solution optimizes the phase correlation algorithm to be differentiable, embeds the algorithm into the end-to-end learning network framework, and constructs a heterogeneous image pose estimation method based on neural network. This method can find the optimal feature extractor based on the image matching results, thereby accurate pose estimation and registration of the heterogeneous images are accurately achieved. However, this registration method is only used in two-dimensional images and cannot achieve registration of three-dimensional observation objects.
For the registration of three-dimensional observations, especially the pose registration task of homogeneous and heterogeneous observations without initial value, the number of degrees of freedom may reach up to 7, which is higher than the registration task of two-dimensional images.
While learning-based methods have proven promising using differentiable solvers, the method either rely on heuristically defined correspondences or are prone to local optima. Therefore, for the registration of three-dimensional observations, designing a pose registration method that can be trained end-to-end to complete the matching of heterogeneous sensors is an urgent technical problem that needs to be solved in the existing technology.
The purpose of the disclosure is to solve the problem of difficulty in registration of three-dimensional observations in the related art, and to provide a heterogeneous three-dimensional observation registration method based on depth phase correlation.
The specific technical solutions adopted by the disclosure are as follows.
In the first aspect, the disclosure provides a heterogeneous three-dimensional observation registration method based on depth phase correlation, configured to register a three-dimensional and heterogeneous first target observation and a source observation, including the following.
As a preferred option of the first aspect, the ten 3D U-Net networks used in the registration method are pre-trained, and the total loss function of the training is the weighted sum of the rotation transformation relationship loss, the scaling transformation relationship loss, the translation transformation relationship loss in the x direction, the translation transformation relationship loss in the y direction, and the translation transformation relationship loss in the z direction between the first target observation and the source observation.
As a preferred option of the first aspect, the weighted weights of the five losses in the total loss function are all 1.
As a preferred option of the first aspect, all five losses in the total loss function adopt L1 loss.
As a preferred option of the first aspect, the ten 3D U-Net networks used in the registration method are independent of each other.
As a preferred option of the first aspect, the observation types of the first target observation and the source observation are three-dimensional medical imaging data, three-dimensional scene measurement data, or three-dimensional object data.
As a preferred option of the first aspect, the rotation transformation relationship includes three degrees of freedom, which are respectively three rotation angles of zyz Euler angles.
As a preferred option of the first aspect, in S13, S15, and S17, the translation transformation relationships of the three dimensions of xyz are obtained simultaneously through phase correlation solution, while only the direction dimension corresponding to each of the steps is retained.
In the second aspect, the disclosure provides a computer-readable storage medium. A computer program is stored on the storage medium. When the computer program is executed by a processor, the heterogeneous three-dimensional observation registration method based on depth phase correlation as described in any option of the first aspects can be realized.
In the third aspect, the disclosure provides a computer electronic device, which includes a storage and a processor.
The storage is used to store computer programs.
The processor is configured to implement the heterogeneous three-dimensional observation registration method based on depth phase correlation as described in any option of the first aspect when executing the computer programs.
The disclosure will be further elaborated and described below together with the accompanying drawings and specific embodiments. The technical features of various embodiments of the disclosure may be combined correspondingly as long as the features do not conflict with each other.
In the description of the disclosure, it should be understood that the terms “first” and “second” are only used for differentiation and description purposes, and may not be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
In the real world, three-dimensional observation data obtained through different sensors are often limited by the characteristics of the sensors themselves, such as angles, proportions, viewing angles, so there is heterogeneity in three-dimensional observations obtained from the same three-dimensional object. Moreover, the sensor is also be subjected to different forms of interference when obtaining data, and the interferences greatly increases the difficulty in registering two heterogeneous observations.
The disclosure optimizes the phase correlation algorithm into a globally convergent differentiable phase correlation solver, and combines the solver with a simple feature extraction network, thereby a heterogeneous three-dimensional observation registration method based on deep phase correlation is established. Specifically, the method first learns dense features from a pair of heterogeneous observations through feature extractors. Then, the features are converted into translation- and scale-invariant spectral characterizations based on Fast Fourier Transform and spherical radial aggregation, and the translation and scale are decoupled from rotation. Next, a differentiable phase correlation solver is used to gradually estimate the rotation, scale, and translation in the spectra independently and efficiently, thereby a pose estimation between the two heterogeneous three-dimensional observations are obtained, and registration can be performed based on the pose estimation. In the entire registration method, the method framework of the pose estimation is differentiable and can be trained end-to-end, with good interpretability and generalization capability.
In a preferred embodiment of the disclosure, a specific implementation of a heterogeneous three-dimensional observation registration method based on depth phase correlation is provided. As shown in
In the disclosure, a pose estimation result between the first target observation and the source observation of the original input may be obtained through pose estimation. The pose estimation result comprises translation, rotation, and scaling transformation relationships of 7 degrees of freedom, thereby the first target observation is registered to the source observation. Among the 7 degrees of freedom of the pose estimation result, the translation transformation relationship includes three degrees of freedom xyz; the rotation transformation relationship may be an SO(3) rotation relationship, which also includes three degrees of freedom; and the scaling transformation relationship includes one degree of freedom.
In order to achieve the pose estimation with the 7 degrees of freedom, the disclosure constructs ten independent trainable 3D U-Net networks in the three stages of rotation, scaling, and translation for the first target observation and the source observation. After being pre-trained under the supervision of the three types of losses of translation, rotation, and scaling, the ten 3D U-Net networks may extract isomorphic features, that is, common features, from the heterogeneous three-dimensional observations, thereby the two heterogeneous three-dimensional observations are converted into isomorphic three-dimensional characterizations. The 3D U-Net network is a network that learns three-dimensional segmentation from sparsely annotated three-dimensional stereo data. In the network, the basic model structure and principle are similar to the 2D U-Net network, including the encoding path part and the decoding path part. The difference lies in the 3D generalization compared to the 2D U-Net network, that is, the convolution, deconvolution, and pooling operation in the encoding path part and the decoding path part are expanded from two dimensions to three dimensions. The specific model structure and principle of the 3D U-Net network belong to the existing technology and may be implemented by directly utilizing the existing network model, so details will not be repeated here.
It should be noted that among the 7 degrees of freedom in the disclosure, the scaling transformation relationship comprising only 1 degree of freedom is predicted by a set of two 3D U-Net networks, and the rotation transformation relationship comprising 3 degrees of freedom is also predicted by only a set of two 3D U-Net networks for overall prediction, but in the translation transformation relationship, the x-direction translation, y-direction translation, and z-direction translation are decoupled, the translation transformation in each direction requires training a separate set of two 3D U-Net networks for prediction to achieve improved accuracy.
The specific implementation process of the heterogeneous three-dimensional observation registration method based on depth phase correlation is described in detail below. The original inputs are the source observation as a template and the first target observation as a registration object. The registration steps are as follows.
It should be noted that the phase correlation solution for spherical surface characterization belongs to the existing technology and may be implemented through a combination of spherical Fast Fourier Transform, element dot product calculation, and SO(3) inverse Fast Fourier Transform, so details will not be repeated here. The rotation transformation relationship obtained by solving the problem is the SO(3) rotation relationship comprising three degrees of freedom. In this embodiment, the zyz Euler angle may be used. Therefore, the R obtained by solving the problem actually comprises three rotation angles of zyz Euler angles. At this time, R is a three-dimensional tensor. Certainly, in addition to the zyz Euler angle used in this embodiment, other Euler angle transformation forms may also be used.
The above rotation transformation relationship R is essentially the angle required to be rotated that the first target observation is registered to the source observation.
It should be noted that the function of Fast Fourier Transform is to perform Fast Fourier Transform on the 3D feature maps extracted by the 3D U-Net network, removing the translation transformation relationship between the feature maps but retaining the rotation and scaling transformation relationships. According to the characteristics of Fast Fourier Transform, only the rotation and scale have an impact on the amplitude of the spectrum, but the amplitude of the spectrum is not sensitive to translation. Therefore, after introducing FFT, a representation manner insensitive to translation but particularly sensitive to scaling and rotation is obtained, and translation may be ignored when solving scaling and rotation subsequently.
It should be noted that in the logarithmic polar transformation, log-polar transformation is performed on the amplitude spectra after FFT transformation and 2D compression, and the spectra are mapped from the Cartesian coordinate system to the logarithmic polar coordinate system. In this mapping process, the scaling and rotation transformations in the Cartesian coordinate system may be converted into translation transformation in the logarithmic polar coordinate system.
It should be noted that phase correlation solution is to calculate the cross-correlation between the two 2D amplitude spectra. According to the correlation obtained by performing the solution, the translation transformation relationship between the two spectra may be obtained. The specific calculation process of cross-correlation belongs to the existing technology, so details will not be repeated here. The translation transformation relationship obtained by phase correlation solution needs to be re-converted to the Cartesian coordinate system to form a relative scaling transformation relationship between the first target observation and the source observation. It may be seen that the coordinate system conversions in S9 and S10 actually belong to a completely corresponding process, and the mapping relationships between the two operations are inverse.
The scaling transformation relationship Mu is essentially the scale Mu required to be scaled in a case that the first target observation is registered to the source observation.
The translation transformation relationship Tx is essentially the distance Tx required to be translated in the x direction in a case that the first target observation is registered to the source observation.
The translation transformation relationship Ty is essentially the distance Ty required to be translated in the y direction in a case that the first target observation is registered to the source observation.
The translation transformation relationship Tz is essentially the distance Tz required to be translated in the z direction in a case that the first target observation is registered to the source observation.
It should be noted that in the operations S13, S15, and S17, although the translations in the three directions of xyz are decoupled and a translation transformation relationship in only one of the directions is obtained in each of the operations, when actually performing phase correlation solution, the translation transformation relationships of the three direction dimensions of xyz may still be obtained simultaneously through phase correlation solution, while only the direction dimension corresponding to each of the steps is retained. That is to say, in S13, the translation transformation relationships of the three direction dimensions of xyz are simultaneously obtained through phase correlation solution, but only the translation transformation Tx in the x direction is retained; in S15, the translation transformation relationships of the three direction dimensions of xyz are simultaneously obtained through phase correlation solution, but only the translation transformation Ty in the y direction is retained; and in S17, the translation transformation relationships of the three direction dimensions of xyz are simultaneously obtained through phase correlation solution, but only the translation transformation Tz in the z direction is retained. Finally, an overall translation transformation relationship T=(Tx, Ty, Tz) may be obtained by combining the translations in the three directions. In order to achieve a case that the first target observation is registered to the source observation, it is required to perform the overall translation transformation T in the three directions.
It may be seen that the pose estimation of the disclosure is implemented in three stages. The pose estimation of the three transformation relationships of rotation, scaling, and translation is carried out stage by stage. Finally, estimated transformations values of a total of 7 degrees of freedom (R, Mu, and T, in which R and T both have three degrees of freedom) are obtained. Based on the results of the estimated transformations values of the 7 degrees of freedom, heterogeneous observation registration may be performed between the first target observation and the source observation.
It should be noted that in the registration process, the ten 3D U-Net networks used to estimate R, Mu, and T are independent of each other and all need to be pre-trained. In order to ensure that each 3D U-Net network may accurately extract isomorphic features, a reasonable loss function needs to be set. The ten 3D U-Net networks are trained together under the same training framework, and a total loss function of the training should be a weighted sum of the rotation transformation relationship loss (i.e. the loss of R), the scaling transformation relationship loss (i.e. the loss of Mu), the translation transformation relationship loss in the x direction (i.e. the loss of Tx), the translation transformation relationship loss in the y direction (i.e. the loss of Ty), and the translation transformation relationship loss in the z direction (i.e. the loss of Tz) between the first target observation and the source observation, and specific weighting values may be adjusted according to actual conditions.
In this embodiment, weighted weights of the five losses in the total loss function are all 1, and all five losses use L1 loss. For ease of description, the rotation transformation relationship R predicted in S4 is denoted as rotation_predict, the scaling transformation relationship Mu predicted in S10 is denoted as scale_predict, the translation transformation relationship Tx in the x direction predicted in S13 is denoted as x_predict, the translation transformation relationship Ty in the y direction predicted in S15 is denoted as y_predict, and the translation transformation relationship Tz in the z direction predicted in S17 is denoted as z_predict. Therefore, during each round of training, R, Mu, and (Tx, Ty, Tz) between the two heterogeneous three-dimensional observations may be obtained based on the current parameters of the model, and then the total loss function L is calculated according to the following process and network parameters are updated.
The ten 3D U-Net networks after training may be used to perform pose estimation between the two heterogeneous three-dimensional observations in the operations S1˜S18, and image registration may be performed based on estimation results.
In order to further evaluate the technical effect of the registration method described in S1˜S18 of the disclosure, actual tests are conducted on different three-dimensional observation types.
As shown in
As shown in
In addition, the disclosure also evaluates the precision of point cloud registration for three-dimensional scene measurement data. The three-dimensional scene measurement data comes from a 3DMatch data set, which collects data from 62 scenes and is commonly used for tasks such as key points, feature descriptors, and point cloud registration of 3D point clouds. When the registration method described in S1˜S18 of the disclosure is tested on the 3Dmatch data set, success criteria are that a registration translation error is less than 10 cm and a registration angle is less than 10 degrees. Final success rates are as shown in Table 1 below.
It may be seen from the results that the disclosure can achieve accurate posture registration for three-dimensional scene measurement data in the form of point clouds.
Similarly, based on the same inventive concept, another preferred embodiment of the disclosure also provides a computer electronic device corresponding to the heterogeneous three-dimensional observation registration method based on depth phase correlation provided in the embodiments, which includes a storage and a processor.
The storage is used to store a computer program.
The processor is configured to implement the heterogeneous three-dimensional observation registration method based on depth phase correlation when executing the computer program.
Therefore, based on the same inventive concept, another preferred embodiment of the disclosure also provides a computer-readable storage medium corresponding to the heterogeneous three-dimensional observation registration method based on depth phase correlation provided in the embodiments. The storage medium stores a computer program. When the computer program is executed by the processor, the heterogeneous three-dimensional observation registration method based on depth phase correlation may be implemented.
Specifically, in the computer-readable storage medium or storage of the two embodiments, the computer program stored is executed by the processor and the steps in the processes in S1˜S18 can be executed. Each step in the processes may be implemented in the form of a program module. In other words, the steps in the processes in S1˜S18 may be implemented in the form of a software product. The computer software product is stored in a storage medium, which includes several commands for enabling a computer device (which may be, for example, a personal computer, a server, or a network device) to execute all or part of the steps of the method described in various embodiments of the disclosure.
It may be understood that the storage medium and the storage may use random access memory (RAM), or may use non-volatile memory (NVM), such as at least one disk storage. At the same time, the storage medium may also be various media storing program codes, for example, a USB flash drive, a portable storage device, a floppy disk, or a CD.
It may be understood that the processor may be a general-purpose processor, including, for example, a central processing unit (CPU), or a network processor (NP); or the processor may also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
The embodiments are only a preferred solution of the disclosure, but the embodiments are not intended to limit the disclosure. Persons of ordinary skill in the relevant technical fields may also make various changes and modifications without departing from the spirit and scope of the disclosure. Therefore, any technical solution obtained by adopting equivalent substitution or equivalent transformation shall fall within the protection scope of the disclosure.
Compared with the related art, the disclosure has beneficial effects as the following.
The disclosure optimizes the phase correlation algorithm into a globally convergent differentiable phase correlation solver, and combines the solver with a simple feature extraction network, thereby a heterogeneous three-dimensional observation registration method based on deep phase correlation is established. The method can perform pose registration on any three-dimensional observation object without initial value. In the heterogeneous three-dimensional observation registration method based on deep phase correlation, the entire method framework is differentiable and can be trained end-to-end, with good interpretability and generalization capability. Test results show that the disclosure can achieve accurate three-dimensional observation registration for three-dimensional objects, scene measurements, and medical image data; and the registration performance is higher than the existing baseline model.
Number | Date | Country | Kind |
---|---|---|---|
202211110592.3 | Sep 2022 | CN | national |
This application is a continuation of international application of PCT application serial no. PCT/CN2023/071661 filed on Jan. 10, 2023, which claims the priority benefit of China application no. 202211110592.3, filed on Sep. 13, 2022. The entirety of each of the above mentioned patent applications is hereby incorporated by reference herein and made a part of this specification.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/071661 | Jan 2023 | WO |
Child | 19063301 | US |