This application claims priority to European Patent Application No. 20305205.5 filed on Feb. 28, 2020, incorporated herein by reference in its entirety.
The present disclosure relates to the field of image processing and more precisely to the improvement of classification performance of neural networks.
The disclosure finds a privileged application in the field of images classification for autonomous driving vehicles, but may be applied to process images of any type.
Semantic information provides a valuable source for scene understanding around autonomous vehicles in order to plan their actions and make decisions.
Semantic segmentation of those scenes allows recognizing cars, pedestrians, traffic lanes, etc. Therefore, semantic segmentation is the backbone technique for autonomous driving systems or other automated systems.
Semantic image segmentation typically uses models such as neural networks to perform the segmentation. These models need to be trained.
Training a model typically comprises inputting known images to the model. For these images, a predetermined semantic segmentation is already known (an operator may have prepared the predetermined semantic segmentations of each image by labelling the images). The output of the model is then evaluated in view of the predetermined semantic segmentation, and the parameters of the model are adjusted if the output of the model differs from the predetermined semantic segmentation of an image.
In order to train a semantic segmentation model, a large number of images and predetermined semantic segmentations are necessary.
For example, it has been observed that the visual condition in bad weather (in particular when there is fog blocking the line of sight) creates visibility problems for drivers and for automated systems. While sensors and computer vision algorithms are constantly getting better, the improvements are usually benchmarked with images taken during good and bright weather. Those methods often fail to work well in other weather conditions. This prevents the automated systems from actually being used: it is not conceivable for a vehicle to avoid varying weather conditions, and the vehicle has to be able to distinguish different objects during those conditions.
It is thus desirable to train semantic segmentation models with varying weather images (images taken during multiple state of visibility due to weather conditions).
However, obtaining semantic segmentation data during those varying weather conditions is particularly difficult and time-consuming.
The disclosure proposes a method that may be used for adapting a model trained for images acquired in good weather conditions to other weather conditions.
More particularly, according to a first aspect, the disclosure proposes a method of adapting an initial mode of a neural network into an adapted model, wherein the initial model has been trained with labeled images of a source domain, said method comprising:
The adapted model minimizes a function of two following distances:
The adapted model may be used for processing new images of the source domain or of the target domain.
In a particular embodiment of the disclosure, the adapted model may be used for classifying, or segmenting the new images. The adapted model may also be used for creating bounding boxes enclosing pixels of the new images. The adapted model may also be used to identify a predetermined object in the new images. The adapted model may also be used to compute a measure of the new images, eg a light intensity.
From a very general point of view, the disclosure proposes a method of adapting a model trained for images of a source domain to images of a target domain.
In one application of the disclosure, images of the source domain are images acquired in high visibility conditions and images of the target domain are images acquired in low visibility conditions.
Also, the expressions “low visibility conditions” and “high visibility conditions” merely indicate that the visibility (for example according to a criterion set by the person skilled in the art) is better under the “high visibility conditions” than under the “low visibility conditions, the gap between the two visibility conditions can be chosen by the person skilled in the art according to the application.
According to the disclosure the adapted model is based on a trained model which has been trained for images of the source domain.
This trained model provides good accuracy for the images of the source domain but not for images of the target domain.
According to the disclosure, the adapted model is obtained by adapting weights of an encoder part of the trained model, the architecture of the trained model and the weights of the second part of the trained model being unchanged. This results in a shorter adaptation training time by considerably reducing the complexity of the adaptation while preserving a good accuracy for images of the source domain.
The cut of the initial trained model into an encoder part and a second part can be made at any layer of the initial model.
Selecting this layer may be achieved after trial-and-error, for example using images of the source domain. The man skilled in the art may select this layer while taking into account that:
The disclosure provides two distances D1 and D2.
D1 measures the distance between features of the source domain output of the encoder part of the initial model and features of the source domain output of the encoder part of the adapted model. This measure represents how the accuracy of the processing of images of the source domain degrades.
D2 measures a distribution distance between probabilities of features obtained for images of the source domain and probabilities of features obtained for images of the target domain. For D2 to be relevant, images of the target domain must statistically represent the same scenes as the images of the source domain but the disclosure does not require a correspondence among images of these two domains. D2 then represents the capacity of the adapted model to process images of the source domain and images of the target domain with the same accuracy.
Function f being based on D1 and D2, the disclosure provides an adapted model which is optimized such that the probability distributions are similar for source and target domains features while keeping the accuracy of the processing of images of the source domain close to the one achieved with the trained initial model.
The adapted model is therefore adapted to process new images of the source domain or of the target domain, in other words images acquired whatever the visibility conditions.
According to a particular embodiment, the function is in the form of (μD2+λD1), where μ and λ are positive real numbers and D1 is the first distance and D2 is the second distance.
These parameters μ and λ may be used to balance the weights of distances D1 or D2.
Other functions f based on D1 and D2 may be used. Preferably the function must be increasing of D1 and increasing of D2.
According to a particular embodiment, the step of adapting the parameters of the encoder part uses a self-supervision loss to measure the first distance D1.
Therefore, in this embodiment, unlabeled images are used for adapting the trained model to the adapted model, labelled-images being used only for training the initial model. This embodiment avoids the need for annotating images or obtaining semantic segmentation data in the target domain, for example for varying visibility conditions.
Measuring D2, the distribution distance between probabilities of features obtained for images of the source domain and probabilities of features obtained for images of the target domain, is complex.
In one embodiment, this distance is obtained statistically using a maximum mean discrepancy metric.
According to another embodiment, the second distance D2 is obtained by a second neural network used to train adversarially said encoder part to adapt the parameters of the adapted model.
The second neural network is therefore trained to learn how to measure D2.
In this embodiment, the second neural network may be for example a 1st order Wasserstein neural network or a Jensen-Shannon neural network.
For more information about adversarially training, the man skilled in the art may in particular refer to:
T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez: “ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation,” in CVPR, 2019): or
Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation,” in CVPR, 2019, pp. 2507-2516).
According to a second aspect, the disclosure concerns a system for adapting an initial model of a neural network into an adapted model wherein the initial model has been trained with labeled images of a source domain, said system comprising:
said adapted model being used for processing new images of said source domain or of said target domain.
In one embodiment of the disclosure, the system is a computer comprising a processor configured to execute the instructions of a computer program.
According to a third aspect, the disclosure related to a computer program comprising instructions to execute a method of adapting an initial model as mentioned above.
The disclosure also relate to storage portion comprising:
wherein the initial model and the adapted model both have an encoder part and a second part configured to process features output from their respective encoder part, the second part of the initial model and the second part of the adapted model having the same parameters.
The disclosure also concerns a vehicle comprising an image acquisition module configured to acquire images, storage portion comprising an adapted model as mentioned above and a module configured to process the acquired images using the adapted model.
How the present disclosure may be put into effect will now be described by way of example with reference to the appended drawings, in which:
Reference will now be made in detail to exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The disclosure has been implemented with Segnet, MobileNeyV2 and DeepLabV3 but others architectures may be used.
More precisely, the method of the disclosure adapts the initial model M{circumflex over (γ)} trained with source domain images xs obtained in high visibility conditions to images of a target domain xt obtained in low visibility conditions (eg dark, foggy or snowy conditions).
At step E10, the initial model M{circumflex over (γ)} is trained with source domain images xs obtained in high visibility conditions.
As shown on
In an adaptation step E20, the initial model M{circumflex over (γ)} is adapted to the target domain. The adapted model is noted Mγ. This adaptation step E20 comprises two preparing steps of copying E210, and dividing E220 the initial model to initialize the adapted model and an adaptation step per se E230 of the adapted model.
The initial model M{circumflex over (γ)} is copied to the adapted model Mγ with its parameters during the copying step E210.
Then, at step E220, the adapted model Mγ is divided into two parts: an encoder part E and a second part F. This division can be made at any layer of the initial model M{circumflex over (γ)} the output layer of the encoder part E being the input layer of the classification part F.
Selecting this layer may be achieved after trial-and-error, for example using images of the source domain. The man skilled in the art may select this layer while taking into account that:
From experience, a good accuracy may be achieved when the cut is made between the 2nd and the 6th layers for networks of size in between 10 and 15 layers.
During the adaptation step E230, the adapted model Mγ is adapted to the target domain by using random images xs of the source domain and random images xt of the target domain. No correspondence exists between these images.
According to the disclosure, the adapted model Mγ has the same architecture as the initial model M{circumflex over (γ)}, only the weights WE
As represented on
The set α of parameters of the second part F is unchanged.
According to the disclosure, the adaption comprises minimizing a function f of distances D1 and D2 detailed below.
In this specific embodiment, f is in the form of (μD2+λD1), where λ and μ are real positive numbers.
The adaptation step E230 is represented by
The adaptation step E230 comprises a step E234 of measuring:
The adapted model Mγ is optimized (by adapting the weights of the encoder part at step E238) such that the probabilities distributions Prp and Prq are similar for source and target domain features (measured by difference D2) and the accuracy of the source domain does not degrade (measured by D1, F being unchanged).
In this specific embodiment, the step E238 of adapting the parameters WE
In this specific embodiment, this optimization consists in minimizing, f=(μD2+λD1) (step E236) where μ and λ are real parameters that can be adjusted to balance D1 and D2.
In one embodiment, at step E234, the second distance D2 can be obtained statistically using a maximum mean discrepancy MMD metric.
But in the specific embodiment described here, the second distance D2 is obtained by a second neural network used to train adversarially the said encoder part E to adapt (E238) its parameters WE
This system comprises a preparing module PM and an adapting module AM.
The preparing module is configured to obtain an initial model M{circumflex over (γ)} which has been trained with labeled images xs,
The adapting module AM is configured to adapt the adapted model Mγ to a target domain xt using random images xs of the source domain and random images xt of the target domain as mentioned before.
In this specific embodiment, the system 100 is a computer. It comprises a processor 101, a read only memory 102, and two flash memories 103A, 103B.
The read only memory 102 comprises a computer program PG comprising instructions to execute a method of adapting an initial model as mentioned above when it is executed by the processor 101.
In this specific embodiment, flash memory 103A comprises the initial model M{circumflex over (γ)} and flash memory 103 B comprises the adapted model Mγ.
Flash memories 103A and 103 B constitute a storage portion according to an embodiment of the disclosure.
In another embodiment, the initial model M{circumflex over (γ)} and the adapted model Mγ are stored in different zones of a same flash memory. Such a flash memory constitutes a storage portion according to another embodiment of the disclosure.
In the specific embodiment described before, the second part F is a classifier.
The claims method adapts (at step E20) an initial model M{circumflex over (γ)} of a neural network into an adapted model Mγ, the initial model M{circumflex over (γ)} having been trained (at step E10) with labeled images of a source domain.
In this specific embodiments, these labeled images are images xs of the source domain, with their corresponding labeled images of ground truth
The method comprises:
The adapted model Mγ is adapted to a target domain xt using random images xs of the source domain and random images xt of the target domain while fixing the parameters WF
The adapted model Mγ may be used to classify new images of the source domain or of said target domain.
Number | Date | Country | Kind |
---|---|---|---|
20305205.5 | Feb 2020 | EP | regional |