The present invention relates to a training apparatus, an angle estimation apparatus, a training method, and an angle estimation method.
Angle information is useful for training a SAR (synthetic aperture radar) object classifier. It is especially helpful when the data is limited. Angle information is also helpful for tasks related to optical images. One example of the tasks related to optical images is face recognition or object recognition.
An angle estimator can estimate the angle information from an input image. The angle information can be the following but not limited to: the object pose of an object in the image, or the shooting angle of the camera which produces the image. Angle information helps increase the performance of a classifier. For example, a classifier only trained with two images at angles A and B without knowing the angle information can only function on images at angles A and B. Nevertheless, a classifier trained with the same images with knowledge of their angle information can function on images at angles other than A and B by interpolating the image information between angles A and B. In face recognition and object recognition, if angle information is available, it is possible to guess how an object looks like if the object is viewed at a new angle, even if there is no training data at that new angle available.
In order to increase the accuracy of the angle estimator, the angle estimator is trained by machine learning methods. There is a method of using ground truth angle labels to train the angle estimator (for example, refer to Non-Patent Literature 1). In the method, images and their ground truth image labels are first input. Next, an angle estimator, which is a learnable neural network, estimates the angles from images. Further, a penalty is computed as the value difference between the estimated angles and the ground truth angles. Furthermore, the angle estimator is updated according to the penalty. After thousands of iterations of updates, the angle estimator estimates angles that match the ground truth angle labels.
Another method is to use ground truth structural data (for example, refer to Non-Patent Literature 2). In the method, images and their ground truth structure data are first input. Next, features are extracted from two-dimensional (2D) images using a learnable feature extractor. Further, angles are estimated from the 2D images using a learnable angle extractor. Furthermore, the structural data is projected according to the estimated angles to obtain the projected features. Then, a penalty is computed as the value difference between the extracted features and the projected features. Afterward, the feature extractor and the angle estimator are updated according to the penalty. After thousands of iterations of updates, the angle estimator estimates angles that make the projected features match the extracted features.
In the method using ground truth angle labels, supervision from angle labels is required. In the method using ground truth structural data, supervision from structural data is required. It is desirable to train an angle estimator without using any supervision such as angle labels or structural data.
Thus, the purpose of the present invention is to provide a training apparatus, an angle estimation apparatus, a training method, and an angle estimation method capable of training a model with respect to angle estimation without using any annotation.
An exemplary aspect of a training apparatus includes one or more feature extraction means for extracting features from input images, one or more angle estimation means for estimating angles from the input images, angle difference computation means for calculating a difference between angles estimated by the one or more angle estimation means, rigid transformation means for transforming the feature of the input image according to the difference, matching loss computation means for calculating a matching loss between a non-transformed feature extracted by the one or more feature extraction means and the feature transformed by the rigid transformation means, and updating means for updating at least the one or more angle estimation means with reference to the matching loss, wherein the rigid transformation means transforms the feature in a way that the feature appears as if it has been extracted from an image at the same angle of the image from which the non-transformed feature value has been extracted.
An exemplary aspect of an angle estimation apparatus includes an angle estimation means for estimating an angle from an input image, wherein the angle estimation means has been trained together with one or more feature extraction means and rigid transformation means, in such a way as the extracted image feature, after being transformed according to the estimated angle, appears as if it was extracted from an image at an angle different from the original input image.
An exemplary aspect of a training method includes extracting features from input images, estimating angles from the input images by one or more angle estimation means, calculating a difference between estimated angles, rigid-transforming the extracted feature of the input image according to the difference, calculating a matching loss between a non-transformed feature and the rigid-transformed feature, and updating at least the one or more angle estimation means with reference to the matching loss, wherein the transformed feature is transformed in a way that it appears as if it has been extracted from an image at the same angle of the image from which the non-transformed feature has been extracted.
An exemplary aspect of an angle estimation method includes estimating the angle from an input image, using an angle estimation apparatus which has been trained together with one or more feature extraction means and rigid transformation means, in such a way as the extracted image feature, after being transformed according to the estimated angle, appears as if it was extracted from an image at an angle different from the original input image.
An exemplary aspect of a training program causes a computer to execute extracting features from input images, estimating angles from the input images by one or more angle estimation means, calculating a difference between estimated angles, rigid-transforming the feature of the input image according to the difference, calculating a matching loss between a non-transformed feature and the rigid-transformed feature, and updating at least the one or more angle estimation means with reference to the matching loss, wherein the transformed feature is transformed in a way that it appears as if it has been extracted from an image at the same angle of the image from which the non-transformed feature has been extracted.
An exemplary aspect of an angle estimation program causes a computer to execute estimating the angle from an input image, using an angle estimation apparatus which has been trained together with one or more feature extraction means and rigid transformation means, in such a way as the extracted image feature, after being transformed according to the estimated angle, appears as if it was extracted from an image at an angle different from the original input image.
The present invention allows training with respect to angle estimation without using any annotation related to angle or object structure.
Hereinafter, the example embodiment of the present invention is described with reference to the drawings. In each of the example embodiments described below, SAR images are assumed as the images. However, the images are not limited to SAR images. As an example, the input images can also be optical images, for example, images photographed by a smart phone.
The training apparatus 101 shown in
In
Image data I1 is inputted to the first feature extractor 111 and the first angle estimator 121. Image data I2 is inputted to the second feature extractor 112 and the second angle estimator 122. The image data I1 and I2 may be a batch of images. An image corresponding to image data I1 is referred to as the first image. An image corresponding to image data I2 is referred to as the second image. Note that the first feature extractor 111 and the second feature extractor 112 can be configured as a single section. The first angle estimator 121 and the second angle estimator 122 can also be configured as a single section.
A relation of the first image corresponding to the input image data I1 and the second image data corresponding to the input image data I2 is as follows. The second image has a different angle from the first image. As an example, the second image may be an image which contains a same object or contains another object from the same class category as the first image, but has been taken at a different view (shooting angle or viewing angle) from the first image. The first image and second image may be taken at the same time or at different times.
The first feature extractor 111 extracts a feature f1 from the input image data I1. The second feature extractor 112 extracts a feature f2 from the input image data I2. The first angle estimator 121 estimates an angle θ{circumflex over ( )}1 from the input image data I1. The second angle estimator 122 estimates an angle θ{circumflex over ( )}2 from the input image data I2. Hereinafter, θ{circumflex over ( )}1 and θ{circumflex over ( )}2 are referred to as the estimated angles. The angle difference computation section 130 calculates the difference Δθ=θ{circumflex over ( )}2−θ{circumflex over ( )}1 . Note that θ{circumflex over ( )} is equivalent to the following expression.
The rigid transformation section 140 applies rigid transform according to Δθ to the feature f1 of the first image to transform the feature f1 into a novel feature f1 to 2 as if the novel feature is extracted from an image at the same view as the second image. For example, the rigid transformation section 140 transforms the feature f1 of the input image data I1 by rotating the feature f1 in any axis and by any angle. The feature f1 may be transformed using other transformation methods, as long as it is transformed as if a new feature will be extracted.
The matching loss computation section 150 calculates a matching loss between a feature extracted from a first image through rigid body transform and a feature extracted from the second image. The model updating section 160 updates at least one of the learnable feature extractors 111, 112 and the learnable angle estimators 121, 122 with reference to the matching loss in a way that the transform novel feature f1 to 2 matches the non-transformed feature f2.
Next, the operation of the training apparatus 101 will be explained with reference to the flowchart in
The training apparatus 101 receives initial model parameters (step S100). The initial model parameters include initial model parameters for the first feature extractor 111, the second feature extractor 112, the first angle estimator 121, and the second angle estimator 122. The received initial model parameters are supplied to the first feature extractor 111, the second feature extractor 112, the first angle estimator 121, and the second angle estimator 122.
The training apparatus 101 receives input image data I1 (step S101). The first feature extractor 111 extracts a feature f1 from the first image (step S111). The first angle estimator 121 estimates an angle of the first image (step S121). The first angle estimator 121 outputs estimated angle θ{circumflex over ( )}1.
The training apparatus 101 receives input image data I2 (step S102). The second feature extractor 112 extracts a feature f2 from the second image (step S112). The second angle estimator 122 estimates an angle of the second image (step S122). The second angle estimator 122 outputs estimated angle θ{circumflex over ( )}2.
Note that the process of step S111 and the process of step S112 can be executed simultaneously. The process of step S111 and the process of step S121 can be executed simultaneously. The process of step S112 and the process of step S122 can be executed simultaneously.
The angle difference computation section 130 calculates the difference 40 between the estimated angles θ{circumflex over ( )}1 and θ{circumflex over ( )}2 (step S130). The rigid transformation section 140 applies rigid transform according to Δθ to the feature f1 extracted from the first image to transform it into a novel feature f1 to 2 as if the novel feature is extracted from an image at the same view as the second image (step S140).
The matching loss computation section 150 calculates |f1 to 2−f2| as a matching loss (S150). The model updating section 160 determines whether the matching loss converged or not (S160). When the matching loss is converged (Yes in the step S160), the process proceeds to the step S162. When the matching loss is not converged (No in the step S160), the process proceeds to the step S161. For example, the model updating section 160 compares the matching loss with a predetermined threshold to determine whether the matching loss is converged or not.
In step S161, the model updating section 160 updates model parameters for the first feature extractor 111, the second feature extractor 112, the first angle estimator 121, and the second angle estimator 122, with reference to the matching loss calculated by the matching loss computation section 150. Then, the process returns to step S111, S112.
In step S162, the model updating section 160 stores, in a storage medium (not shown in
Referring an explanatory diagram of
At the training stage, images of objects (for example, an image 1 shown in
Assume the feature extracted from I0 degree is referred to as f0 degree (refer to
An angle estimator (for example, the first angle estimator 121, and the second angle estimator 122) estimates angles from image I0 degree and I90 degrees. As an example, assume that the estimated angle θ{circumflex over ( )}1 is 20 degrees and the estimated angle θ{circumflex over ( )}2 is 65 degrees. In this case, the difference Δθ is 45 degrees.
A rigid transformation section rotates f0 degree 45 degrees. The transformed feature f0 degree to 90 degrees is shown in the right of the bottom row in
A matching loss between the transformed feature f0 degree to 90 degrees which is transformed from the view 0 degree to 90 degrees and the non-transformed feature f90 degrees at the same view 90 degrees is expressed by the following.
By updating the angle estimator, or the feature extractor and the angle estimator with reference to the match loss in a way that if the entire computation flow from inputting images to obtaining the matching loss repeats for a sufficiently large number of iterations, the matching loss shall tend to be minimized towards zero.
At the testing or application stage, given a newly obtained image of an object from the same class, the trained angle estimator can estimate it shooting angle correctly.
In the present example embodiment, an angle θc of a canonical view is input to the first angle difference computation section 131 and the second angle difference computation section 132. θc is predetermined by a user. The first angle difference computation section 131 calculates a difference Δθ1 between θc and θ{circumflex over ( )}1. The second angle difference computation section 132 calculates a difference Δθ2 between θc and θ{circumflex over ( )}2.
The first rigid transformation section 141 applies rigid transform according to Δθ1 to the feature f1 extracted from the first image to transform the feature f1 into a novel feature f1 to c. The second rigid transformation section 142 applies rigid transform according to Δθ2 to the feature f2 extracted from the second image to transform the feature f2 into a novel feature f2 to c.
In the present example embodiment, the matching loss computation section 151 calculates |f1 to c−f2 to c| as a matching loss.
Next, the operation of the training apparatus 102 will be explained with reference to the flowchart in
In the present example embodiment, the first angle difference computation section 131 calculates a difference Δθ1 between θc predetermined by a user and θ{circumflex over ( )}1 in step S131. The second angle difference computation section 132 calculates a difference Δθ2 between θc and θ{circumflex over ( )}2 in step S132.
The first rigid transformation section 141 applies rigid transform according to Δθ1 to the feature f1 to transform it into a novel feature f1 to c as if the novel feature is extracted from an image at the canonical view (step S141). The second rigid transformation section 142 applies rigid transform according to Δθ2 to the feature f2 to transform it into a novel feature f2 to c as if the novel feature is extracted from an image at the canonical view (step S142).
The matching loss computation section 151 calculates |f1 to c−f2 to c| as a matching loss (step S151).
Although the angle difference is computed based on the estimated angles for the first input image and the second input image in the first example embodiment, in the present example embodiment, the angle difference is calculated based on the estimated angle for one image, for example the first image or the second image, and a predetermined canonical angle. If the angle estimations are not accurate yet in the first example embodiment, the error of angle difference is 2 units, while in the present example embodiment, the error is 1 unit. In conclusion, the present example embodiment may be more robust than the first example embodiment.
The decoder 170 generates a reconstructed image I{circumflex over ( )}1 to 2. The decoder 170 also generates a reconstructed image I{circumflex over ( )}1.
In the present example embodiment, the matching loss computation section 152 calculates a matching loss between I{circumflex over ( )}1 to 2 and I2. In addition, the matching loss computation section 152 calculates a matching loss between I{circumflex over ( )}1 and I1. The model updating section 161 updates at least one of the learnable feature extractor 110 and the learnable angle estimators 121, 122, and the decoder 170 with reference to the matching loss.
Next, the operation of the training apparatus 103 will be explained with reference to the flowchart in
In the present example embodiment, the decoder 170 generates a reconstructed image I{circumflex over ( )}1 to 2 using the transformed feature f1 to 2 obtained from the feature f1 by the rigid transformation section 140 (step S170). The decoder 170 further generates a reconstructed image I{circumflex over ( )}1 using the feature f1 in step S170.
The matching loss computation section 152 calculates the difference between the reconstructed image I{circumflex over ( )}1 to 2 and the image data I2, and the difference between the reconstructed image I{circumflex over ( )}1 and the image data I1 as the matching loss (step S152). In step S161A, the model updating section 161 updates model parameters for the feature extractor 110, the first angle estimator 121, the second angle estimator 122 and the decoder 170, with reference to the matching loss calculated by the matching loss computation section 152. The model updating section 161 updates the decoder 170 so that the matching loss relative to the reconstructed images will decrease.
Features are high-level abstractions of images. That is, features contain much less information than images. Thus, unlike comparing at the feature level as the previous embodiments do, the present example embodiment comparing at the image level is more robust. The reason is because, comparing the reconstructed transformed image with the original image encourages details to match, while comparing the transformed features with the non-transformed feature may ignore the details but only focus on matching the outline. In other words, the present example embodiment is expected to be more robust than the first embodiment.
The first pre-processor 181 applies a predetermined pre-processing to the first image. The pre-processed image data is supplied to the first feature extractor 111 and the first angle estimator 121. The second pre-processor 182 applies a predetermined pre-processing to the second image. The pre-processed image data is supplied to the second feature extractor 112 and the second angle estimator 122.
Next, the operation of the training apparatus 104 will be explained with reference to the flowchart in
In the present example embodiment, the first pre-processor 181 applies a predetermined pre-processing to the first image in step S181. Specifically, the first pre-processor 181 processes the image data I1. The second pre-processor 182 applies a predetermined pre-processing to the second image in step S182. Specifically, the second pre-processor 182 processes the image data I2.
One example of the pre-process is background removal. Another example of the pre-process is noise reduction. When background removal is processed, assume a picture of a car on the street is obtained. In case only the car would be recognized, the pre-processor removes the background which is the street. The background and the car can be separated using image segmentation methods, for example, and only image pixels for the car remain.
In general, images, especially SAR images, contain noises. When noise reduction is processed, the pre-processor can remove noises for optical or SAR images. The pre-processor may use median filter, Gaussian blur, Fast-Fourier Transform based methods or even learnable neural networks, etc. for removing noises.
Note that the pre-process is not limited to the background removal and the noise reduction. The pre-process can also be designed as a learnable neural network that extracts low-level features. These low-level features are shared by the feature extractors and the angle estimators. By doing so, the number of trainable parameters of the neural network can be reduced. In other words, the training of the network can be more efficient.
By removing the background or noise reduction for example, the extracted features contain merely or mainly the information of the objects. This encourages the angle estimation to be more accurate.
In the present example embodiment, the first post-processor 191 applies a predetermined post-processing to the feature extracted by the first feature extractor 111. The post-processed feature is supplied to the rigid transformation section 140 as the feature f1. The second post-processor 192 applies a predetermined post-processing to the feature extracted by the second feature extractor 112. The post-processed feature is supplied to the matching loss computation section 150 as the feature f2.
Next, the operation of the training apparatus 105 will be explained with reference to the flowchart in
In the present example embodiment, the first post-processor 191 performs a predetermined post-process to the feature extracted by the first feature extractor 111. Specifically, the first post-processor 191 performs processing to enable angle estimation to be performed more accurately. The second post-processor 192 performs a predetermined post-process to the feature extracted by the second feature extractor 112. Specifically, the second post-processor 192 performs processing to enable angle estimation to be performed more accurately.
One example of the post-process is normalization. Another example of the post-process is masking. When normalization is performed, assume the features are 3D point clouds, for example. By performing point normalization, coordinates of all points are normalized in the range [0, 1]. Before normalization, coordinates of some points may have very large values, e.g. 10, while some may be very small, e.g. 0.1, the large difference causes the matching loss to be very large. As a result, the model is not easy to train. However, in the present example embodiment, normalization suppresses the unwanted increase in matching loss.
When masking is performed, assume the features are feature maps, for example, then after rigid transformation, values at the boundary are lost. A masking filter only retains the values in the central part. In the present example embodiment, masking is used to make the transformed features, which has values lost at the boundary, to be comparable to the non-transformed features at the boundary. Note that the post-process is not limited to normalization or masking, it can be some learnable neural networks such as conditional generative networks, etc.
By point normalization or feature map masking, the features (including transformed features and non-transformed features) are more suitable for comparison. This encourages the angle estimation to be more accurate.
The training device 100 is equivalent to any of the training apparatuses 101-105 of the first to fifth embodiments. The angle estimator 61 is equivalent to the first angle estimator 121 or the second angle estimator 122 extracted from any of the training apparatuses 101-105 of the first to fifth embodiments, for example.
Thus, the angle estimator 61 can be trained by the training device 100 as described in the first to fifth embodiments.
The angle estimator 61 of the present example embodiment can estimate the angle information in an input image correctly.
Each component in each of the above example embodiments may be configured with a piece of hardware or a piece of software. Alternatively, the components may be configured with a plurality of pieces of hardware or a plurality of pieces of software. Further, part of the components may be configured with hardware and the other part with software.
The functions (processes) in the above example embodiments may be realized by a computer having a processor such as a central processing unit (CPU), a memory, etc. For example, a program for performing the method (processing) in the above example embodiments may be stored in a storage device (storage medium), and the functions may be realized with the CPU executing the program stored in the storage device.
The computer can realize the function of the angle estimator 61 in the angle estimation apparatus shown in
A storage device 1001 is, for example, a non-transitory computer readable media. The non-transitory computer readable medium is one of various types of tangible storage media. Specific examples of the non-transitory computer readable media include a magnetic storage medium (for example, hard disk), a magneto-optical storage medium (for example, magneto-optical disc), a compact disc-read only memory (CD-ROM), a compact disc-recordable (CD-R), a compact disc-rewritable (CD-R/W), and a semiconductor memory (for example, a mask ROM, a programmable ROM (PROM), an erasable PROM (EPROM), a flash ROM).
The program may be stored in various types of transitory computer readable media. The transitory computer readable medium is supplied with the program through, for example, a wired or wireless communication channel, or, through electric signals, optical signals, or electromagnetic waves.
The memory 1002 is a storage means implemented by a RAM (Random Access Memory), for example, and temporarily stores data when the CPU 1000 executes processing. It can be assumed that a program held in the storage device 1001 or a temporary computer readable medium is transferred to the memory 1002 and the CPU 1000 executes processing based on the program in the memory 1002.
In each of the example embodiments described above, images are typically SAR images. However, the images are not limited to SAR images. As an example, the images can also be optical images, for example, images photographed by a smart phone.
Since the trained angle estimation apparatus of the above embodiment can estimate the angle information of an image correctly, the trained angle estimation apparatus can be integrated into other image processing system to provide angle information to improve overall performance of the system. For example, when the angle estimation apparatus of the above embodiment provides an estimated head pose of a human face image to a face recognition system, the recognition accuracy of the system is improved due to the system having extra knowledge about the human head pose.
A part of or all of the above example embodiments may also be described as, but not limited to, the following supplementary notes.
(Supplementary Note 1) A training apparatus comprising:
(Supplementary note 2) The training apparatus according to Supplementary note 1, wherein the updating means also updates the one or more feature extraction means.
(Supplementary note 3) The training apparatus according to Supplementary note 1 or 2, further comprising
(Supplementary note 4) The training apparatus according to Supplementary note 1 or 2, further comprising
(Supplementary note 5) A training apparatus comprising:
(Supplementary note 6) A training apparatus comprising:
(Supplementary note 7) The training apparatus according to any one of Supplementary notes 1 to 6, wherein
(Supplementary note 8) An angle estimation apparatus comprising:
(Supplementary note 9) A training method for training an apparatus having one or more angle estimation means, comprising:
(Supplementary note 10) The training method for training the apparatus having one or more feature extraction means according to Supplementary note 9,
(Supplementary note 11) An angle estimation method comprising:
(Supplementary note 12) A computer readable information recording medium storing a training program, for training an apparatus having one or more angle estimation means, causing a computer to execute:
(Supplementary note 13) The computer readable information recording medium according to Supplementary note 12, wherein
(Supplementary note 14) A computer readable information recording medium storing an angle estimation program causing a computer to execute:
Although the invention of the present application has been described above with reference to example embodiments, the present invention is not limited to the above example embodiments. Various changes can be made to the configuration and details of the present invention that can be understood by those skilled in the art within the scope of the present invention.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2022/004092 | 2/2/2022 | WO |