The present application claims the benefit under 35 U.S.C. § 119 of European Patent Application No. EP 23 17 6815.1 filed on Jun. 1, 2023, which is expressly incorporated herein by reference in its entirety.
The present invention relates to a device and a computer-implemented method for evaluating a digital image.
Digital images are evaluated for example in the field of at least partially autonomous vehicles or automated optical inspection devices.
A device and computer implemented method for evaluating a digital image according to features of the present invention provide a simple score to classify digital images.
According to an example embodiment of the present invention, the method comprises providing the digital image, providing a first part of a predetermined model, wherein the predetermined model is configured for determining a semantic segmentation of the digital image with a second part of the predetermined model, wherein the first part is configured to determine a feature depending on the digital image, wherein the second part is configured to determine the semantic segmentation depending on the feature, wherein the method comprises determining the feature depending on the digital image with the first part, providing a set of quantizations for quantizing the feature, determining the quantization of the feature depending on the set of quantizations and depending on the feature, determining a quantization error depending on the feature and the quantization, and evaluating the digital image depending on the quantization error. The feature and the quantization may be vectors. The quantization error may be a distance between the vectors that serves as the simple score to evaluate the digital image.
The method may comprise providing the second part of the model, wherein the second part is configured for determining the semantic segmentation of the digital image depending on the quantization, and determining the semantic segmentation of the digital image depending on the quantization with the second part. The score that is provided in addition to the semantic segmentation allows evaluating the semantic segmentation.
In one example embodiment of the present invention, providing the set of quantizations comprises providing a reference for the semantic segmentation of the digital image, determining the semantic segmentation of the digital image, and determining a quantization in the set of quantizations depending on a difference between the reference and the semantic segmentation and depending on the quantization error. The first part and the second part are predetermined parts of the predetermined model. The quantization is determined without modifying the first part or the second part.
In one example embodiment of the present invention, providing the pretrained model comprises training the first part to determine the feature and the second part to determine the semantic segmentation depending on the feature. The set of quantizations is inserted into the pretrained model between the pretrained first part and the pretrained second part.
In one example embodiment of the present invention, the method comprises determining the feature with a predetermined normalization, and determining the quantization for the feature with the predetermined normalization. The normalization normalizes the feature and the quantization in the same way in order to facilitate the determination of the quantization error. The vectors representing feature and quantization may be normalized to unit length.
In one example embodiment of the present invention, the method comprises upscaling the feature from a first scale to a second scale, determining the quantization in the second scale from the feature in the second scale, downscaling the quantization from the second scale to the first scale and determining the quantization error in the first scale. The change of the resolution may improve accuracy.
Preferably, according to an example embodiment of the present invention, the feature is a vector and the quantization of the feature is a vector, wherein determining the quantization error comprises determining a cosine distance between the feature and the quantization of the feature.
According to an example embodiment of the present invention, providing the digital image may comprise capturing the digital image in particular with a camera of an at least partially autonomous vehicle or an automated optical inspection device.
According to an example embodiment of the present invention, evaluating the digital image may comprise detecting an anomaly if the quantization error exceeds a threshold or not detecting the anomaly otherwise.
According to an example embodiment of the present invention, the device for evaluating the digital image comprises at least one processor and at least one storage for storing the digital image and instructions that, when executed by the at least one processor, cause the at least one processor to execute the method, wherein the at least one processor is configured for executing the instructions. The device according to the present invention provides advantages that correspond to the advantages the method of the present invention provides.
A program may be provided, wherein the program comprises instructions that, when executed by at least one processor, cause the at least one processor to execute the method. The program provides advantages that correspond to the advantages the method provides.
Further embodiments of the present invention are derived from the following description and the figures.
The digital image is for example a video image, radar image, LiDAR image, ultrasonic image, motion image, thermal image.
The device 100 comprises at least one processor 102 and at least one storage 104 for storing the digital image x and instructions.
The instructions cause the at least one processor 102 to execute a method for evaluating the digital image x, when executed by the at least one processor 102.
The at least one processor 102 is configured for executing the instructions.
The device 100 may comprise a camera 106 or an interface for receiving a digital image from the camera 106.
A program may comprise the instructions.
The method for evaluating the digital image x comprises a step 202.
In the step 202, the digital image x is provided.
Providing the digital image x may comprise capturing the digital image x.
According to an example, the digital image x is captured with the camera 106.
The camera 106 is for example part of or mounted to an at least partially autonomous vehicle or an automated optical inspection device.
The method comprises a step 204.
In step 204, a predetermined model {tilde over (G)} is provided.
The predetermined model {tilde over (G)}=D∘F comprises a first part F and a second part D.
In the example, the predetermined model {tilde over (G)} is a pretrained semantic segmentation network, i.e., an artificial neural network.
The predetermined model {tilde over (G)} is for example determined in a training based on a data set X={xi,yi}i=1, . . . , N comprising N input images xi∈[0,1]c
The second part D comprises a last layer of the network {tilde over (G)}. The first part F comprises the other layers of the network {tilde over (G)}.
The first part is configured to determine a feature ze=F(x) depending on the digital image x.
The predetermined model {tilde over (G)} is configured for determining the semantic segmentation ŷ of the digital image x with the second part D.
The second part D is configured to determine the semantic segmentation ŷ depending on the feature ze.
Providing the pretrained model may comprise training the first part F to determine the feature ze and the second part D to determine the semantic segmentation ŷ depending on the feature ze.
The method comprises a step 206.
The step 206 comprises providing a predetermined set of quantizations {tilde over (Q)} for quantizing the feature ze.
The method comprises a step 208.
In the step 208, the feature ze is determined depending on the digital image x with the first part F.
The feature ze is for example determined with a predetermined normalization N.
The method comprises a step 210.
In the step 210, a quantization zq of the feature ze is determined depending on the set of quantizations {tilde over (Q)} and depending on the feature ze.
The quantization zq for the feature ze is for example determined with the predetermined normalization N:
with the normalized set of quantizations
The method comprises a step 212.
In the step 212, a quantization error
is determined depending on the feature ze and the quantization zq.
According to an example, the feature ze is a vector and the quantization zq of the feature ze is a vector. In the example, the feature ze has the same dimension as the quantization zq.
According to an example, determining the quantization error comprises determining a cosine distance between the feature ze and the quantization zq of the feature ze.
The method comprises a step 214.
In the step 214, the semantic segmentation ŷ of the digital image x is determined depending on the quantization zq with the second part D.
In one example, the semantic segmentation ŷ is determined with the normalized set of quantizations Q:
In one example, the semantic segmentation ŷ is determined with the set of quantizations {tilde over (Q)}:
G denominates an amended model resulting from adding the set of quantizations {tilde over (Q)} or the normalized set of quantizations Q to the predetermined model {tilde over (G)}.
The second part D is configured for determining the semantic segmentation ŷ of the digital image x depending on a quantization zq of the feature ze. In the example, the dimensions of the quantization zq and the feature ze are the same.
The method comprises a step 216.
In the step 216, the digital image x is evaluated depending on the quantization error Equant.
Evaluating the digital image may comprise detecting an anomaly if the quantization error Equant exceeds a threshold or not detecting the anomaly otherwise.
The quantization error is in one example used as a signal for determining whether a region in the digital image is anomalous or nominal.
The step 216 may comprise triggering an action when the anomaly is detected. The step 216 may comprise triggering an action when the absence of the anomaly is detected.
The action may be an emergency braking of the vehicle or alerting a user of the anomaly. The action may be actively selecting the digital image for transmission to a back-end computer for further processing or not selecting it for that purpose.
Further processing may comprise machine learning with the digital image or testing, verifying or validating a machine learning system with the digital image.
The action may be capturing another digital image without forwarding the digital image in case the anomaly is detected or directly transmitted the evaluated digital image in particular to a back-end otherwise.
The action may be sorting out the evaluated digital image for labelling in case the anomaly is detected.
The step 214 is in the example executed to determine the semantic segmentation ŷ and the evaluation regarding the existence or absence of an anomaly in the digital image x.
The step 214 may be omitted in order to evaluate the existence or absence of an anomaly based on the quantization error Equant.
The amended model G represents an anomaly segmentation system.
A use case for this evaluation is to distinguish outliers from nominal data in semantic segmentations of digital images. This is particularly useful in safety relevant applications such as autonomous driving.
For example, the semantic segmentation is provided for detecting a presence of an object representing a road surface, a pedestrian, a vehicle. An outlier in may be an unknown object.
A use cases for this evaluation is to classify outliers from nominal data in automated optical inspection.
In one example, the set of quantizations {tilde over (Q)} or the normalized set of quantizations Q is provided as a module. The module is inserted in an existing neural network that is already trained for semantic segmentation as a penultimate layer of the neural network.
The set of quantizations {tilde over (Q)} or the normalized set of quantizations Q may be added to another layer of the neural network.
Preferably, a vector quantization layer representing the set of quantizations {tilde over (Q)} or the normalized set of quantizations Q is added.
The trained vector quantization layer is used to detect anomalies, e.g. by flagging input data as anomalous if its quantization error, i.e. a vector quantizer determined by the vector quantization layer, is above the given threshold.
In one example, the only parameters that are retrained are those of the set of quantizations {tilde over (Q)} or the normalized set of quantizations Q. These are typically much fewer parameters than the entire neural network. Thereby it becomes possible to incorporate the module for an outlier detection into an already trained neural network.
In one example, the amended model G is fine-tune or the last layer, i.e. a classifier layer of the amended model G is retrained to further improve the performance.
The method may comprise upscaling the feature ze from a first scale to a second scale, determining the quantization zq in the second scale from the feature ze in the second scale, downscaling the quantization zq from the second scale to the first scale and determining the quantization error Equant in the first scale.
The amended model may be trained in a training with training data.
The training may comprise determining parameters that define the set of quantizations {tilde over (Q)} and/or the normalization N. Preferably, the training comprises determining the parameters that define the set of quantizations {tilde over (Q)} and/or the normalization N without modifying the first part and/or the second part of the predetermined model.
The training data may comprise pairs of a reference y(k) for the semantic segmentation ŷ(k) and a digital image xk. The reference y(k) is for example a one hot-encoding of the label y.
Providing the set of quantizations {tilde over (Q)} in the training comprises:
The digital image comprises pixel boundaries between classes that are assigned in the semantic segmentation.
The set of quantizations {tilde over (Q)} according to one example is a codebook comprising codebook vectors, i.e. the vectors in the set of quantizations {tilde over (Q)}.
Pixel boundaries between classes are for example regions with large quantization errors Equant, due to the absence of nearby codebook vectors.
This can cause large false positive rates in anomaly prediction. To mitigate this the amended model G may be trained depending on an additional loss term, a boundary loss Lboundary, that overweighs the quantization error Equant on the class boundaries during training and encourages the allocation of codebook vectors on the class boundary.
Note that logit-based uncertainty methods cannot avoid larger uncertainties and thus anomaly scores when traversing pixel boundaries between classes, due to the continuity of the output of the network.
Let mi denote an appropriately resized binary mask of class boundaries of the reference y, i.e. m=1 on class boundary regions and zero elsewhere. The boundary loss Lboundary is then defined by:
wherein ⊙ denotes the pointwise product. The amended model G is trained using for example the loss function:
wherein α, λ and β are hyperparameters. An anomaly score is for example computed from the Equant scaled up to the second resolution.
The decision of whether to classify a pixel as an outlier is decided for example from whether its quantization error Equant lies above or below the threshold. A value t of the threshold is for example 0≤t≤1.
According to one example, the amended model G is a neural network comprising the normalized quantization Q. The neural network that is trained with the loss function L. The threshold t is selected to determine the sensitivity. During inference a digital image is passed through the neural network. At the layer representing the normalized quantization the quantization error Equant is computed. Pixels that correspond to a quantization error Equant>t are labelled as anomalous, the remaining pixels as nominal. The quantization error Equant may be upscaled to the original image resolution.
A neural network comprising the set of quantizations {tilde over (Q)} instead of the normalized quantization Q is processed alike.
| Number | Date | Country | Kind |
|---|---|---|---|
| 23176815.1 | Jun 2023 | EP | regional |