Embodiments presented herein relate to a method, a system, a computer program, and a computer program product for retraining a pre-trained object classifier.
Object recognition is a general term to describe a collection of related computer vision tasks that involve identifying objects in image frames. One such task is object classification that involves predicting the class of an object in an image frame. Another such task is object localization that refers to identifying the location of one or more objects in the image frame, and optionally providing a bounding box enclosing the thus localized object. Object detection can be regarded as a combination of these two tasks and thus both localizes and classifies one or more objects in an image frame.
One way to perform object classification is to compare an object to be classified with a set of template objects representing different object classes, and then classifying the object to the object class of the template obj ect that, according to some metric, is most similar to the object. The classification can be based on a model that is trained with datasets of annotated objects. Models for object classification can be developed using Deep Learning (DL) techniques. The resulting models can be referred to as object classification DL models, or just models for short. Such models can be trained based on datasets that are continuously improved. Trained and updated models need to be sent, during firmware update or similar update processes, to systems using an old version of the models.
In general terms, a model is only as good as it is trained. Therefore, a certain amount of further training of the model would improve the classification. In other words, the more training data, the better (in terms of accuracy of the classification, the computational speed at which the classification can be made, etc.) the object classifier would perform. In order to capture as much training data as possible and to be as general as possible, models are traditionally static pre-trained generic models that are applied to sets of training data, regardless of under which conditions the image frames in the training data have been captured. This generally requires the training data to be annotated. However, the cost of annotation, in terms of either manual labour or computer computations, as well as privacy regulations might limit the possibility to annotate large set of training data as well as the acquisition of the actual training data itself.
US 2017/039455 A1 relates to a method for securing an environment.
Foroughi Homa et al: “Robust people counting using sparse representation and random projection”, Pattern Recognition, Vol. 48, No. 10, 1 Oct. 2015, pages 3038-3052, relates to a method for estimating the number of people present in an image for practical applications including visual surveillance and public resource management.
US 2021/042530 A1 relates to aspects of an artificial-intelligence (AI) powered ground truth generation for object detection and tracking on image sequences.
Hence, there is still a need for improved training of object classifiers.
In general terms, according to the herein disclosed concepts, improved training of an image classifier is achieved by retraining a pre-trained object classifier. In turn, according to the herein disclosed concepts, this is achieved by annotating instances of tracked objects to belong to an object class based on the fact that another instance of the same tracked object already has been verified to belong to the same object class.
According to a first aspect, a method for retraining a pre-trained object classifier is performed by a system that comprises processing circuitry. The method comprises obtaining a stream of image frames of a scene. Each of the image frames depicts an instance of a tracked object. The tracked object is one and the same object being tracked when moving in the scene. The method comprises classifying, with a level of confidence, each instance of the tracked object to belong to an object class. The method comprises verifying that the level of confidence for at least one of the instances of the tracked object, and for only one object class, is higher than a threshold confidence value. It can thereby be ensured that the at least one of the instances of the tracked object is classified with high confidence to only one object class. The method comprises, when so, annotating all instances of the tracked object in the stream of image frames as belonging to only one object class (i.e., the object class for which the level of confidence for the at least one of the instances of the tracked object is higher than the threshold confidence value) with high confidence, yielding annotated instances of the tracked object. The method comprises retraining the pre-trained object classifier with at least some of the annotated instances of the tracked object.
According to a second aspect, the concepts are defined by a system for retraining a pre-trained object classifier. The system comprises processing circuitry. The processing circuitry is configured to cause the system to obtain a stream of image frames of a scene. Each of the image frames depicts an instance of a tracked object. The tracked object is one and the same object being tracked when moving in the scene. The processing circuitry is configured to cause the system to classify, with a level of confidence, each instance of the tracked object to belong to an object class. The processing circuitry is configured to cause the system to verify that the level of confidence for at least one of the instances of the tracked object, and for only one object class, is higher than a threshold confidence value. It can thereby be ensured that the at least one of the instances of the tracked object is classified with high confidence to only one object class. The processing circuitry is configured to cause the system to, when so, annotate all instances of the tracked object in the stream of image frames as belonging to only one object class (i.e., the object class for which the level of confidence for the at least one of the instances of the tracked object is higher than the threshold confidence value) with high confidence, yielding annotated instances of the tracked object. The processing circuitry is configured to cause the system to retrain the pre-trained object classifier with at least some of the annotated instances of the tracked object.
According to a third aspect, the concepts are defined by a system for retraining a pre-trained object classifier. The system comprises an obtainer module configured to obtain a stream of image frames of a scene. Each of the image frames depicts an instance of a tracked object. The tracked object is one and the same object being tracked when moving in the scene. The system comprises a classifier module configured to classify, with a level of confidence, each instance of the tracked object to belong to an object class. The system comprises a verifier module configured to verify that the level of confidence for at least one of the instances of the tracked object, and for only one object class, is higher than a threshold confidence value. It can thereby be ensured that the at least one of the instances of the tracked object is classified with high confidence to only one object class. The system comprises an annotator module configured to annotate all instances of the tracked object in the stream of image frames as belonging to only one object class (i.e., the object class for which the level of confidence for the at least one of the instances of the tracked object is higher than the threshold confidence value) with high confidence, yielding annotated instances of the tracked object. The system comprises a re-trainer module configured to re-train the pre-trained object classifier with at least some of the annotated instances of the tracked object.
According to a fourth aspect, the concepts are defined by a computer program for retraining a pre-trained object classifier, the computer program comprising computer program code which, when run on a system, causes the system to perform a method according to the first aspect.
According to a fifth aspect, the concepts are defined by a computer program product comprising a computer program according to the fourth aspect and a computer readable storage medium on which the computer program is stored. The computer readable storage medium could be a non-transitory computer readable storage medium.
Advantageously, these aspects provide improved training of a pre-trained object classifiers.
The annotated instances of the tracked object can be used for training, or re-training also of other object classifiers than the pre-trained object classifier mentioned in the above aspects. Hence, advantageously, these aspects provide means for automatically generating large sets of annotated training data from sets with only partially annotated training data.
Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to “a/an/the element, apparatus, component, means, module, step, etc.” are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, module, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
The concepts will now be described, by way of example, with reference to the accompanying drawings, in which:
The concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments are shown. These concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the concepts to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.
As noted above there is still a need for improved training of object classifiers. In general terms, the classification result of object classifiers of objects might depend on the angle at which the image frames of the object are captured, the lighting conditions at which the image frames of the object are captured, the image resolutions of the image frames, etc. Some angles, lighting conditions, image resolutions, etc. thus result in higher confidence levels than other angles, lighting conditions, image resolutions, etc.
This means that when a tracked object is moving along a path through the scene, it is likely that the tracked object, during some parts of the path, will be comparatively easy to classify by the used object classifier, and that the object, during some other parts of the path, will be comparatively difficult to classify using the same object classifier, due to changing angles, lighting conditions, etc. However, this fact cannot be taken advantage of since object trackers and object classifiers typically operate independently of each other.
Tracked objects that are comparatively easy to classify generally results in high confidence values whereas tracked objects that are comparatively difficult to classify generally results in low confidence values for the object in that specific image frame. Since it is unlikely that an object will change from one object class to another, e.g., from “person” to “truck”, during its movement through the scene, this means that if an object at any one instance is classified with a confidence level, e.g., as a “person” with high confidence, the underlying physical object which the tracked object represents will remain a “person” even if the tracked object at a later stage in its path, and thus in a later image frame, is more difficult to classify, e.g., the tracked object is still classified as a “person” but only with low confidence. This realization is utilized by the herein disclosed concepts. It is here noted that it is the task of the object classifier to classify tracked objects to belong to any given object class to some level of confidence. The herein disclosed embodiments are based on the assumption that the object classifier is behaving correctly and does not provide false positives in this regard, which thus represents the normal behaviour of any object classifier.
In particular, according to the herein disclosed concepts, this realization is utilized to generate new and classified training data for a pre-trained object classifier. At least some of the herein disclosed embodiments therefore relate to automated gathering and annotation of training data for refined object classification in general or for specific camera installation. The embodiments disclosed herein in particular relate to mechanisms for retraining a pre-trained object classifier. In order to obtain such mechanisms, there is provided a system, a method performed by the system, a computer program product comprising code, for example in the form of a computer program, that when run on a system, causes the system to perform the method.
In some examples each camera device 120a, 120b is a digital camera device and/or capable of pan, tilt and zoom (PTZ) and can thus be regarded as a (digital) PTZ camera device. The system 110 is configured to communicate with a user interface 130 for displaying the captured image frames. Further, the system 110 is configured to encode the images such that it can be decoded using any known video coding standard, such as any of: High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2; Advanced Video Coding (AVC), also known as H.264 and MPEG-4 Part 10; Versatile Video Coding (VVC), also known as H.266, MPEG-I Part 3 and Future Video Coding (FVC); VP9, VP10 and AOMedia Video 1 (AV1), just to give some examples. In this respect, the encoding might be performed either directly in conjunction with the camera devices 120a, 120b capturing the image frames or at another entity, and then, at least temporarily, stored in a database 116. The system 110 further comprises a first entity 114a and a second entity 114b. Each of the first entity 114a and the second entity 114b might be implemented in either the same or in separate computing devices. The first entity 114a might e.g., be configured to obtain a stream of image frames of the scene 160 and to provide the second entity 114b with instances of the tracked object 170. Further details of the first entity 114a will be disclosed with reference to
In some aspects, the first entity 114a, the second entity 114b, and the database 116 defines a video management system (VMS). The first entity 114a, the second entity 114b, and the database 116 are therefore considered as part of the system 110. The first entity 114a, the second entity 114b, and the database 116 are operatively connected to the camera devices 120a, 120b over a network 112. The network 112 might be wired, wireless, or partly wired and partly wireless. In some examples the system 110 comprises a communication interface 520 (as in
Embodiments for retraining a pre-trained object classifier 114b will now be disclosed with parallel reference to
S102: A stream of image frames of a scene 160 is obtained. The stream of image frames might be obtained by obtainer module 410. Each of the image frames depicts an instance of a tracked object 170. The tracked object 170 is one and the same object 170 being tracked when moving in the scene 160. The stream of image frames might be obtained by obtainer module 410 from at least one camera device 120a, 120b, as in
S104: Each instance of the tracked object 170 is classified, with a level of confidence, to belong to an object class. The classification might be performed by classifier module 420.
S106: It is verified that the level of confidence for at least one of the instances of the tracked object 170, and for only one object class, is higher than a threshold confidence value. The verification might be performed by verifier module 430. It can thereby be ensured that the at least one of the instances of the tracked object 170 is classified with high confidence to only one object class. The level of confidence for the instances of the tracked object 170 that is used as reference thus needs to be higher than a threshold confidence value for only one object class. This threshold confidence value can be adjusted upwards or downwards as desired to change accuracy of the retraining of the pre-trained object classifier 114b. For example, the accuracy will be high by setting a very high threshold confidence value (such as 0.95 on a scale from 0.0 to 1.0). Furthermore, if it can be observed that the produced amount of false labels is higher than desired, then the threshold confidence value can be adjusted upwards to increase the accuracy of the retraining of the pre-trained object classifier 114b.
S118: All instances of the tracked object 170 in the stream of image frames are annotated as belonging to only one object class (i.e., the object class for which the level of confidence for the at least one of the instances of the tracked object 170 is higher than the threshold confidence value) with high confidence. This annotation yields annotated instances of the tracked object 170. The annotation might be performed by annotator module 440.
If it can be successfully verified that the level of confidence for at least one of the instances of the tracked object 170 is higher than the threshold confidence value, one or more instances of the same object in other image frames but only having been classified with a medium or low confidence level can thus be extracted and annotated as if classified with high confidence.
S118 is in this respect thus only entered when it can be successfully verified that the level of confidence for at least one of the instances of the tracked object 170 is higher than the threshold confidence value in S106. As noted above, by means of S106 it can be ensured that the tracked object 170 is classified with a high confidence to only one object class before S118 is entered.
S122: The pre-trained object classifier 114b is re-trained with at least some of the annotated instances of the tracked object 170. The re-training might be performed by re-trainer module 460.
The pre-trained object classifier 114b can thus be re-trained with additional annotated instances of the tracked object 170. If this is done for an object classification model, the object classification model will therefore over time be able to provide improved classification of objects better in areas of the scene 160 where classification was originally harder, i.e., where the classifications were given low or medium confidence. An example of this will be provided below with reference to
Hence, if the tracked object 170 in S106 is verified to be classified as belonging to only one object class with high confidence in at least one image frame, the same tracked object 170 from other image frames are used for re-training of the pre-trained object classifier 114b. Likewise, the tracked object 170 in other image frames will not be used for re-training of the pre-trained object classifier 114b if the tracked object 170 is never classified as belonging to one object class with high confidence.
Embodiments relating to further details of retraining a pre-trained object classifier 114b as performed by the system 110 will now be disclosed. As indicated in
There may be further verifications made before all instances of the tracked object 170 in the stream of image frames are annotated as belonging to the object class, and thus before S118 is entered. Different embodiments relating thereto will now be described in turn.
It could be that some tracked objects 170 are classified to belong to two or more object classes, each classification having its own level of confidence. In some aspect, it is therefore verified that only one of these object classes has high level of confidence. Hence, in some embodiments, at least some of the instances of the tracked object 170 are classified to also belong to a further object class with a further level of confidence, and the method further comprises:
S108: It is verified that the further level of confidence is lower than the threshold confidence value for the at least some of the instances of the tracked object 170. The verification might be performed by verifier module 430. Hence, if the tracked object 170 is classified with high confidence to two or more different object classes, the tracked object 170 will not be used for re-training of the pre-trained object classifier 114b. The same is true, i.e., the tracked object 170 will not be used for re-training, if the tracked object 170 is never classified with high confidence to any object class.
In some aspects, the path along which the tracked object 170 moves from one image frame to the next is also tracked. If the tracked object 170 is classified as belonging to only one object class with high confidence at least once, the path can be used for re-training of the pre-trained object classifier 114b. Hence, in some embodiments, the tracked object 170 moves along a path in the stream of image frames, and the path is tracked when the tracked object 170 is tracked.
In some aspects, it is then verified that the path itself can be tracked with high accuracy. That is, in some embodiments, the path is tracked at a level of accuracy, and the method further comprises:
S110: It is verified that the level of accuracy is higher than a threshold accuracy value. The verification might be performed by verifier module 430.
In some aspects, it is verified that the path has neither split nor merged. If the path has split and/or merged at least once, this could be an indication that the level of accuracy at which the path is tracked is not higher than the threshold accuracy value. This may be the case if it is suspected that the path has been merged from two or more other paths or is split into two or more other paths. If so, the path is determined to be of low accuracy, and it will not be used for re-training of the pre-trained object classifier 114b. Hence, in some embodiments, the method further comprises:
S112: It is verified that the path has neither split into at least two paths nor merged from at least two paths within the stream of image frames. In other words, it is verified that the path is free from any splits or merges, and thus constitutes one single path. The verification might be performed by verifier module 430.
The same principles can also be applied if it is suspected that the tracked object 170 has odd size behaviour that can be suspected to be caused by shadowing, mirror effects, or similar. If so, the tracked object 170 is assumed to be classified with low confidence, and it will not be used for re-training of the pre-trained object classifier 114b. Hence, in some embodiments, the tracked object 170 has a size in the image frames, and the method further comprises:
S114: It is verified that the size of the tracked object 170 does not change more than a threshold size value within the stream of image frames. The verification might be performed by verifier module 430. Since the apparent size of the tracked object 170 depends on the distance between the tracked object 170 and the camera device 120a, 120b, a compensation of the size of the tracked object 170 with respect to this distance can be made as part of the verification in S114. In particular, in some embodiments, the size of the tracked object 170 is adjusted by distance-dependent compensation factor determined as a function of the distance between the tracked object 170 and the camera device 120a, 120b having captured the stream of image frames of the scene 160 when verifying that that the size of the tracked object 170 does not change more than the threshold size value within the stream of image frames 210a:210c.
As illustrated in, and as disclosed with reference to,
In some aspects, for example when the stream of image frames originates from image frames having been captured by at least two camera devices 120a, 120b, but also in other examples, there is a risk that the tracking of the tracked object 170 is lost and/or that the classification of the tracked object 170 changes from one image frame to the next. Therefore, in some embodiments, the method further comprises:
S116: It is verified that the object class of the instances of the tracked object 170 does not change within the stream of image frames. The verification might be performed by verifier module 430.
In some aspect, in order to avoid training bias (e.g., machine learning bias, algorithm bias, or artificial intelligence bias) the pre-trained object classifier 114b is not re-trained using any tracked objects 170 already classified with a high confidence level. There are different ways to accomplish this. A first way is to explicitly exclude the tracked objects 170 already classified with a high confidence level from the re-training. In particular, in some embodiments, the pre-trained object classifier 114b is retrained only with the annotated instances of the tracked object 170 for which the level of confidence was verified to not be higher than the threshold confidence value. A second way is to set a low weighting value for the tracked objects 170 already classified with a high confidence level during the re-training. In this way, the tracked objects 170 already classified with a high confidence level can be implicitly excluded from the re-training. That is, in some embodiments, each of the annotated instances of the tracked object 170 is assigned a respective weighting value according to which the annotated instances of the tracked object 170 are weighted when the pre-trained object classifier 114b is retrained, and the weighting value of the annotated instances of the tracked object 170 for which the level of confidence was verified to be higher than the threshold confidence value is lower than the weighting value of the annotated instances of the tracked obj ect 170 for which the level of confidence was verified to not be higher than the threshold confidence value. Thereby, by means of the weighting values, the object classifier 114b will during its re-training be less influenced by the tracked objects 170 already classified with a high confidence level.
There could be further different uses of the annotated instances of the tracked object 170 than re-training the the pre-trained object classifier 114b. In some aspects, the annotated instances of the tracked object 170 might be collected at a database 116 and/or be provided to a further device 118. Hence, in some embodiments, the method further comprises:
S120: The annotated instances of the tracked object 170 are provided to a database 116 and/or a further device 118. The annotated instance might be provided to the database 116 and/or the further device 118 by provider module 450. This enables also other pre-trained object classifiers to benefit from the annotated instances of the tracked object 170.
Reference is next made to
Particularly, the processing circuitry 510 is configured to cause the system 110 to perform a set of operations, or steps, as disclosed above. For example, the storage medium 530 may store the set of operations, and the processing circuitry 510 may be configured to retrieve the set of operations from the storage medium 530 to cause the system 110 to perform the set of operations. The set of operations may be provided as a set of executable instructions.
Thus, the processing circuitry 510 is thereby arranged to execute methods as herein disclosed. The storage medium 530 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory. The system 110 may further comprise a communications interface 520 at least configured for communications with further devices, functions, nodes, and devices. As such the communications interface 520 may comprise one or more transmitters and receivers, comprising analogue and digital components. The processing circuitry 510 controls the general operation of the system 110 e.g., by sending data and control signals to the communications interface 520 and the storage medium 530, by receiving data and reports from the communications interface 520, and by retrieving data and instructions from the storage medium 530. Other components, as well as the related functionality, of the system 110 are omitted in order not to obscure the concepts presented herein.
In the example of
The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.
Number | Date | Country | Kind |
---|---|---|---|
21214232.7 | Dec 2021 | EP | regional |