This application claims the priority benefit of Chinese Patent Application No. 202210887240.2, filed on Jul. 26, 2022 in the China National Intellectual Property Administration, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates generally to information processing and computer vision, and more particularly, to a method, a device and a storage medium for post-processing in multi-target tracking.
With the development of computer science and artificial intelligence, it is becoming increasingly universal and effective to use computers to run artificial intelligence models based on neural networks to implement information processing. Computer vision is an important application field of artificial intelligence models.
A branch of computer vision technology is multi-target tracking. Multi-target tracking is commonly referred to as MTT (Multiple Target Tracking; sometimes also abbreviated as MOT: Multiple Object Tracking) briefly, which is used to detect (locate) and endow identifications (IDs) to targets of types of interest such as pedestrians, automobiles or/or animals in a video, so as to perform trajectory tracking, without knowing the number of the targets in advance. A desired tracking result is that: the same target (e.g., a certain person) in multiple frames of images in a video is identified with the same ID, and different targets are identified with different IDs, so as to achieve subsequent work such as trajectory prediction, precise searching and the like. MTT is a key technology in the field of computer vision, and has been widely applied in fields such as autonomous driving, intelligent monitoring, behavior recognition and the like.
In multi-target tracking, for an input video, a tracking result of targets is output. A tracking result can be displayed by imaging. For example, in a tracking result image, each target is indicated by, for example, a rectangular bounding box with a corresponding ID identification number and/or color. In an image sequence of multiple frames of a video, a moving trajectory of a bounding box of the same ID can be regarded as a trajectory of a target of the ID, and each trajectory point on the trajectory corresponds to a corresponding image patch. In these multiple frames, an image patch sequence of multiple image patches indicated by the bounding box of the ID is referred to as a tracklet (tracklet). It is possible to determine time information and location information of each image patch in a tracklet. The time information can be a time when a target is at a location as shown by the image patch, i.e., a photographing time t of an image; and the location information can be a location (referred to as “image coordinate system location”) of the image patch in the image at the time t, and/or a location (referred to as “actual coordinate system location”) of the target in a real space at the time t.
The adverse factors affecting the accuracy of a result of multi-target tracking include: occlusion, target overlapping, illumination, attitude changes, etc. It is challenging to improve the accuracy of a result of multi-target tracking.
A brief summary of the present disclosure will be given below to provide a basic understanding of some aspects of the present disclosure. It should be understood that the summary is not an exhaustive summary of the present disclosure. It does not intend to define a key or important part of the present disclosure, nor does it intend to limit the scope of the present disclosure. The object of the summary is only to briefly present some concepts, which serves as a preamble of the detailed description that follows.
In order to improve the accuracy of multi-target tracking, it is possible to perform post-processing on a tracking result (e.g., a tracklet indicating a trajectory of a single target) outputted by a multi-target tracking model. A circumstance of reducing the accuracy of multi-target tracking is that: in an image patch sequence SqPatch of a tracklet indicating a trajectory of a single target Tg[x], image patches of different targets actually appear. For example, a target in Patch[i] is Tg[x], while another target in Patch[i+1] is Tg[x′]. This is referred to as identification-switch (id-switch). The occurrence of identification-switch means the appearance of an incorrect trajectory. In a tracklet, identification-switch may occur two, three, or even more times. The technical problems to be solved by embodiments of the present disclosure include but are not limited to at least one of: reducing identification-switch, and suppressing the appearance of an incorrect trajectory.
According to an aspect of the present disclosure, there is provided a computer-implemented method for post-processing in multiple-target tracking. The method comprises making attempts to split a tracklet indicative of a trajectory of a single target by performing operations of: determining a re-identification feature set of an image patch sequence by determining a re-identification feature of each image patch in the image patch sequence of the tracklet; determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set; in a case where a determination result is “yes”, verifying whether it is credible that identification-switch has occurred at the candidate identification switch image patch; and in a case where a verification result is “credible”, splitting the tracklet into two tracklets based on the candidate identification switch image patch
According to an aspect of the present disclosure, there is provided a device for post-processing in multi-target tracking. The device comprises: a memory having instructions stored thereon; and at least one processor connected with the memory and configured to execute the instructions to make attempts to split a tracklet indicative of a trajectory of a single target by performing operations of: determining a re-identification feature set of an image patch sequence by determining a re-identification feature of each image patch in the image patch sequence of the tracklet; determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set; in a case where a determination result is “yes”, verifying whether it is credible that identification-switch has occurred at the candidate identification switch image patch; and in a case where a verification result is “credible”, splitting the tracklet into two tracklets based on the candidate identification switch image patch.
According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having a program stored thereon. The program, when executed, causes a computer to make attempts to split a tracklet indicative of a trajectory of a single target by performing operations of: determining a re-identification feature set of an image patch sequence by determining a re-identification feature of each image patch in the image patch sequence of the tracklet; determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set; in a case where a determination result is “yes”, verifying whether it is credible that identification-switch has occurred at the candidate identification switch image patch; and in a case where a verification result is “credible”, splitting the tracklet into two tracklets based on the candidate identification switch image patch.
The beneficial effects of the methods, devices and storage media of the present disclosure include at least of: reducing identification-switch, and improving the accuracy of multi-target tracking.
Embodiments of the present disclosure will be described below with reference to the accompanying drawings, which will help to more easily understand the above and other objects, features and advantages of the present disclosure. The accompanying drawings are merely intended to illustrate the principles of the present disclosure. The sizes and relative positions of units are not necessarily drawn to scale in the accompanying drawings. The same reference numbers may denote the same features. In the accompanying drawings:
Hereinafter, exemplary embodiments of the present disclosure will be described combined with the accompanying drawings. For the sake of clarity and conciseness, the specification does not describe all features of actual embodiments. However, it should be understood that many decisions specific to the embodiments may be made in developing any such actual embodiment, so as to achieve specific objects of a developer, and these decisions may vary as embodiments are different.
It should also be noted herein that, to avoid the present disclosure from being obscured due to unnecessary details, only those device structures closely related to the solution according to the present disclosure are shown in the accompanying drawings, while other details not closely related to the present disclosure are omitted.
It should be understood that, the present disclosure will not be limited only to the described embodiments due to the following description with reference to the accompanying drawings. Herein, where feasible, embodiments may be combined with each other, features may be substituted or borrowed between different embodiments, and one or more features may be omitted in one embodiment.
Computer program code for performing operations of various aspects of embodiments of the present disclosure can be written in any combination of one or more programming languages, the programming languages including object-oriented programming languages, such as Java, Smalltalk, C++ and the like, and further including conventional procedural programming languages, such as “C” programming language or similar programming languages.
Methods of the present disclosure can be implemented by circuitry having corresponding functional configurations. The circuitry includes circuitry for a processor.
An aspect of the present disclosure relates to a method for post-processing in Multi-Target Tracking (MTT). The method can be implemented with a computer. Exemplary description of a method 100 for post-processing of the present disclosure will be made with reference to
In operation S101, a re-identification feature set Fs[i] of an image patch sequence SqPatch[i] is determined by determining a re-identification feature (generally represented as F[j]) of each image patch in the image patch sequence SqPatch[i] of the tracklet Trk[i] indicative of the trajectory of the single target Tg[x]. A target identification attribute of the tracklet Trk[i] is rk[i].id=x, that is, the tracklet Trk[i] is a tracklet for the target Tg[x] which is given by multi-target tracking. For a segment of video, if a plurality of targets appear therein, multi-target tracking can give a plurality of tracklets corresponding to a plurality of targets. Referring to
Referring to
In a case where a determination result is “yes”, operation S105 is performed to verify whether it is credible that identification-switch has occurred at the candidate identification switch image patch Patch_sc.
In a case where a verification result is “credible”, operation S107 is performed to split the tracklet into two tracklets based on the candidate identification switch image patch Patch_sc. For example, if the image patch sequence SqPatch[i] of the tracklet Trk[i] is represented as Patch[jStart], . . . . . . , Patch[j], . . . . . . , Patch[jEnd], wherein, the candidate identification switch image patch Patch_sc=Patch[js], then Patch[jStart], . . . . . . , Patch[j], . . . . . . , Patch[jEnd] is split into: a first tracklet Trk_1: Patch[jStart], . . . . . . , Patch[js]; and a second tracklet Trk_2: Patch[js+1], . . . . . . , Patch[jEnd]. The “other processing” in
Further exemplary description of the details of the method 100 will be made below.
It is possible to determine a candidate identification switch image patch based on similarities of adjoining image patch pairs. A method for determining a candidate identification switch image patch according to an embodiment of the present disclosure will be exemplarily described with reference to
In operation S301, feature similarities of re-identification feature pairs of a plurality of adjoining image patch pairs in the image patch sequence SqPatch[i] are determined. For example, SqPatch[i] comprises the image patches Patch[jStart], . . . . . . Patch[j], . . . . . . , Patch[jEnd], then jmax=jEnd-jStart adjoining image patch pairs can be obtained, and a j-th feature similarity Sim[j] is a similarity Sim(F[j+1], F[j]) between a re-identification feature F[j+1] of an image patch Patch[j+1] and a re-identification feature F[j] of an image patch Patch[j]. The similarity can be a cosine similarity between re-identification features.
In operation S303, it is determined whether a candidate identification switch image patch is present in the tracklet Trk[j] according to whether a special feature similarity Simp less than a predetermined similarity threshold sTh is present in the plurality of feature similarities. In an example, when a special feature similarity Simp is present, it is determined that a candidate identification switch image patch is present in the tracklet Trk[j], and, an image patch associated with the special feature similarity Simp is designated as the candidate identification switch image patch. For example, when the special feature similarity Simp is Sim[j] (i.e., Sim[j]<sTh), the image patch Patch[j] is designated as the candidate identification switch image patch.
Optionally, it is possible to find at a time all special feature similarities in the plurality of feature similarities, and to designate image patches associated therewith as candidate identification switch image patches.
In an embodiment, it is possible to determine a candidate identification switch image patch based on a global similarity matrix determined from similarities of all image patch pairs in the image patch sequence.
In operation S501, a global similarity matrix GS representing similarities between respective image patches in the image patch sequence is generated based on feature similarities of the plurality of re-identification feature pairs in the re-identification feature set Fs. An element s(j,j′) in the global similarity matrix GS is a similarity Sim(F[j], F[j]) between the re-identification features F[j], F[j], where j, j′ E [jStart, jEnd].
In operation S503, it is determined whether the candidate identification switch image patch is present in the tracklet Trk[i] based on the global similarity matrix GS.
In operation S701, a Gaussian checkerboard kernel KG (k,l) is determined based on a common checkerboard kernel Kbox and a two-dimensional Gaussian function Ø(k,l). An example of a 5*5 common checkerboard kernel Kbox is as shown by Equation (1).
The two-dimensional Gaussian function Ø(k, l) is as shown by Equation (2), where, ε is a parameter of the two-dimensional Gaussian function.
An element kG(k, l) of the two-dimensional Gaussian function Ø(k, l) is as shown by Equation (3).
k
G(k,l)=kbox(k,l)*Ø(k,l) (3)
Where, k,l∈[−L, L]; L is an integer greater than 1, for example, L=10; and a size of the common checkerboard kernel Kbox is (2 L+1)*(2 L+1).
It is possible to directly use Equation (3) to calculate a checkerboard kernel transformation value. Optionally, the element k G (k, l) can be used to calculate a checkerboard kernel transformation value after being updated to a normalized value according to Equation (4).
In operation S703, a checkerboard kernel transformation value Δ(j) of each image patch in the image patch sequence is determined by summing the products of elements of a local similarity matrix LS[j] of each image patch Patch[j] in the image patch sequence and corresponding elements in the Gaussian checkerboard kernel K G (k, l), wherein the local similarity matrix LS of each image patch in the image patch sequence is determined based on elements in a local area corresponding to the image patch in the global similarity matrix. The checkerboard kernel transformation value Δ(j) can be determined according to Equation (5).
Δ(j)=Σk,l∈[−L,L]kG(k,l)*s(j+k,j+l) (5)
where, s(j+k,j+l) is an element in the global similarity matrix GS, and when j is too large or too small so that an element index j+k, j+l exceeds a range of ([j Start, jEnd]), s(j+k,j+l) is set to zero. That is, when the checkerboard kernel transformation value Δ(j) is calculated, the local similarity matrix LS[j] is used, and the matrix LS[j] is a matrix composed of elements being centered on an element s (j, j) and being indexed within a range of [j−L, j+L] in the global similarity matrix GS, with a size of (2 L+1)*(2 L+1). That is, the elements in the local area corresponding to the image patch comprise a central element corresponding to the image patch on a main diagonal of the global similarity matrix and adjacent elements of the central element.
In operation S705, a candidate identification switch image patch is determined based on a highest peak in a transformation value curve. Specifically, in a case where a transformation value curve representing changes in the checkerboard kernel transformation value of each image patch has at least one peak, it is determined that a candidate identification switch image patch is present, and an image patch corresponding to a highest peak among the at least one peak is determined as the candidate identification switch image patch.
An initial tracklet in the method 100 may include a plurality of splitting points (that is, identification-switch has occurred multiple times). For this case, it is possible to determine all splitting points through recursive processing. In an embodiment, the method 100 can include recursive processing: updating the tracklet to each of two subtracklets respectively, and continuing to making attempts to split a current tracklet. When the number of times of splitting exceeds a predetermined number threshold of times (e.g., four times), making attempts to split the current tracklet is stopped; and the number of times of splitting refers to the number of times of splitting an initial tracklet to obtain the current tracklet. Further, when attempts are made to split the current tracklet, if no splitting occurs (for example, no candidate identification switch image patch is found or a candidate identification switch image patch is not credible) in the end, then skip the recursive processing, and turn back to the main program and output a processing result.
The candidate identification switch image patch determined in the method 100 may not be a real splitting point used to eliminate identification-switch. It is thus possible to consider various conditions to verify whether the candidate identification switch image patch is credible. In an embodiment, it is verified whether it is credible that identification-switch has occurred at the candidate identification switch image patch based on at least one of a first condition C1, a second condition C2, and a third condition C3.
The first condition C1: in an original image corresponding to the candidate identification switch image patch, a largest occlusion rate of a bounding box of the candidate identification switch image patch is greater than a predetermined occlusion rate threshold oTh (e.g., 0.5). The original image refers to that, in a process in which multi-target tracking processes an image sequence captured by a camera to output a tracklet, the image sequence contains an image of the candidate identification switch image patch. An occlusion rate of an image patch in the original image can be represented as: in the original image, a ratio of the area of an overlapping area between a bounding box of an overlapping image patch that overlaps with the image patch and a bounding box of the image patch to the area of the bounding box of the image patch. When a concerned image patch (bounding box) does not overlap with other image patches (other bounding boxes) in the original image, an occlusion rate of the concerned image patch is 0. When a concerned image patch (bounding box)) overlaps with a plurality of other image patches (other bounding boxes) in the original image, then since each pair of overlapping image patches has an overlapping area, there are also a plurality of occlusion rates of the concerned image patch, and a largest occlusion rate is a largest one of the plurality of occlusion rates of the concerned image patch. Note that, when a concerned image patch (bounding box) does not overlap with other image patches (other bounding boxes) in the original image, it is regarded that a largest occlusion rate of the concerned image patch is 0. Occlusion would easily lead to identification-switch, and in case of a larger occlusion rate, incorrect target matching would more easily occur, and an incorrect tracklet that has a flaw would occur; thus, this embodiment considers the first condition. For example, in a case where it is determined that the candidate identification switch image patch satisfies the first condition (that is, occlusion has occurred to the candidate identification switch image patch, and the occlusion is severer), it is determined (verified) that it is credible that identification-switch has occurred at the candidate identification switch image patch.
The condition C2: an angle between a moving direction of a target in the candidate identification switch image patch and a moving direction of a target in a later image patch after the candidate identification switch image patch in the tracklet is greater than a predetermined angle threshold aTh. For example, in the tracklet Trk[i], a moving direction of a target in the candidate identification switch image patch Patch[j] is a first direction dir1, a moving direction of a target in a later image patch Patch[j+1] after the candidate identification switch image patch Patch[j] is a second direction dir2, and if an angle between the first direction dir1 and the second direction dir2 is very large (e.g., larger than 90 degrees, or, larger than 150 degrees), it is very possible that incorrect target matching has occurred. A moving direction of a target can be determined, for example, according to a central location (in image coordinate system) of a bounding box (current bounding box) of the image patch and a central location of a previous bounding box. For example, in a case where it is determined that the candidate identification switch image patch satisfies the second condition (that is, a moving direction of a target in the image patch has been greatly changed), it is determined (verified) that it is credible that identification-switch has occurred at the candidate identification switch image patch.
The third condition C3: a similarity Sim(Patch1k, Patch2k) between a key image patch Patch1k of a front tracklet Trk1 and a key image patch Patch2k of a back tracklet Trk2 is less than a predetermined image patch similarity threshold pTh. The similarity Sim(Patch1k, Patch2k) can be a cosine similarity between re-identification features F1k and F2k, wherein, F1k is a re-identification feature of the key image patch Patch1k, and F2k is a re-identification feature of the key image patch Patch2k. The front tracklet Trk1 is a subtracklet before the candidate identification switch image patch Patch-sc in the tracklet Trk[i]. For example, if the candidate identification switch image patch is Patch[j], then Trk1 is an image patch sequence composed of Patch[jStart] to Patch[j−1]. The back tracklet Trk2 is a remaining subtracklet other than the front tracklet Trk1 in the tracklet. For example, if the candidate identification switch image patch is Patch[j], then Trk2 is an image patch sequence composed of Patch[j] to Patch[jEnd].
A key image patch of a subtracklet sTrk in the front tracklet Trk1 and the back tracklet Trk2 is determined by: determining an r-value of each image patch in the subtracklet sTrk; and selecting an image patch with a largest r value as a key image patch of the subtracklet sTrk. Where r is associated with d, o and h/H_max. For example, r can be determined according to Equation (6).
r=d−o+h/H_max (6)
Where, d is detection confidence of a bounding box of the image patch; o is an occlusion rate of the bounding box of the image patch; h is a height of the bounding box; and H_max is a maximum bounding box height in the subtracklet. In a variant example, r is the weighted sum of “d”, “-o”, “h/H_max”. For example, in a case where it is determined that the candidate identification switch image patch satisfies the third condition (that is, a re-identification feature of a key image patch has been greatly changed), it is determined (verified) that it is credible that identification-switch has occurred at the candidate identification switch image patch.
With regard to verifying that identification-switch has occurred at the candidate identification switch image patch, it is possible to design a predetermined rule based on the first condition C1, the second condition C2, and the third condition C3 to improve the accuracy of verification. The predetermined rule can be artificially set based on experience, and can also be given through learning using a decision tree. The predetermined rule can consider all the three conditions. It is possible to use samples to perform learning, and a result given by the learning includes, preferably, a predetermined occlusion rate threshold oTh, a predetermined angle threshold aTh, and a predetermined image patch similarity threshold pTh.
A finer verification method is to set, for the third condition C3, different thresholds in different cases. Preferable values of the different thresholds can be determined through learning using samples.
For a sample tracklet, its identification-switch rate is 4.46%, and through the post-processing process described in the method 100, the identification-switch rate is decreased to 0.141%. This shows that the method 100 of the present disclosure can significantly reduce identification-switch in multi-target tracking.
In an embodiment of the present disclosure, there is provided a device for post-processing in multi-target tracking. Exemplary description will be made with reference to
In an embodiment of the present disclosure, there is provided another device for post-processing in multi-target tracking. Exemplary description will be made with reference to
An aspect of the present disclosure provides a non-transitory computer-readable storage medium having a program stored thereon. The program, when executed, causes a computer to make attempts to split a tracklet indicative of a trajectory of a single target by performing operations of: determining a re-identification feature set of an image patch sequence by determining a re-identification feature of each image patch in the image patch sequence of the tracklet; determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set; in a case where a determination result is “yes”, verifying whether it is credible that identification-switch has occurred at the candidate identification switch image patch; and in a case where a verification result is “credible”, splitting the tracklet into two tracklets based on the candidate identification switch image patch. The program has a corresponding relationship with the method 100. For the further configuration situation of the program, reference may be made to the description of the method 100 of the present disclosure.
According to an aspect of the present disclosure, there is further provided an information processing apparatus.
The CPU 1201, the ROM 1202 and the RAM 1203 are connected to each other via a bus1204. An input/output interface 1205 is also connected to the bus 1204.
The following components are connected to the input/output interface 1205: an input part 1206, including a soft keyboard and the like; an output part 1207, including a display such as a Liquid Crystal Display (LCD) and the like, as well as a speaker and the like; the storage part 1208 such as a hard disc and the like; and a communication part 1209, including a network interface card such as an LAN card, a modem and the like. The communication part 1209 executes communication processing via a network such as the Internet, a local area network, a mobile network or a combination thereof.
A driver 1210 is also connected to the input/output interface 1205 as needed. A removable medium 1211 such as a semiconductor memory and the like is installed on the driver 1210 as needed, such that programs read therefrom are installed in the storage device 1208 as needed.
The CPU 1201 can run a program corresponding to a method for post-processing in multi-target tracking.
In the embodiments of the present disclosure, the post-processing as involved can reduce identification-switch, so as to avoid adverse effects caused by occlusion, illumination and attitude changes on multi-target tracking.
The beneficial effects of the methods, devices, and storage media of the present disclosure include at least one of: reducing identification-switch, and improving the accuracy of multi-target tracking.
As described above, according to the present disclosure, the principle of post-processing in multi-target tracking which reduces identification-switch has been disclosed. It should be noted that, the effects of the solution of the present disclosure are not necessarily limited to the above-mentioned effects, and in addition to or instead of the effects described in the preceding paragraphs, any of the effects as shown in the specification or other effects that can be understood from the specification can be obtained.
Although the present invention has been disclosed above through the description with regard to specific embodiments of the present invention, it should be understood that those skilled in the art can design various modifications (including, where feasible, combinations or substitutions of features between various embodiments), improvements, or equivalents to the present invention within the spirit and scope of the appended claims. These modifications, improvements or equivalents should also be considered to be included within the protection scope of the present invention.
It should be emphasized that, the term “comprise/include” as used herein refers to the presence of features, elements, operations or assemblies, but does not exclude the presence or addition of one or more other features, elements, operations or assemblies.
In addition, the methods of the various embodiments of the present invention are not limited to be executed in the time order as described in the specification or as shown in the accompanying drawings, and may also be executed in other time orders, in parallel or independently. Therefore, the execution order of the methods as described in the specification fails to constitute a limitation to the technical scope of the present invention.
The present disclosure includes but is not limited to the following solutions.
1. A computer-implemented method for post-processing in multi-target tracking, characterized by comprising making attempts to split a tracklet indicative of a trajectory of a single target by performing operations of:
2. The method according to Appendix 1, wherein determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set comprises:
3. The method according to Appendix 2, wherein in a case where it is determined that a special feature similarity less than a predetermined similarity threshold is present in the feature similarities, it is determined that a candidate identification switch image patch is present in the tracklet; and
4. The method according to Appendix 1, wherein determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set comprises:
5. The method according to Appendix 4, wherein determining whether the candidate identification switch image patch is present in the tracklet based on the global similarity matrix comprises:
6. The method according to Appendix 5, wherein the elements in the local area corresponding to the image patch comprise a central element corresponding to the image patch on a main diagonal of the global similarity matrix and adjacent elements of the central element.
7. The method according to Appendix 1, wherein the operations further comprise: updating the tracklet to each of the two tracklets respectively, and continuing to making attempts to split a current tracklet.
8. The method according to Appendix 7, wherein when the number of times of splitting exceeds a predetermined number threshold of times, making attempts to split the current tracklet is stopped; and the number of times of splitting refers to the number of times of splitting an initial tracklet to obtain the current tracklet.
9. The method according to Appendix 8, wherein the predetermined number threshold of times is 4.
10. The method according to Appendix 1, wherein it is verified whether it is credible that identification-switch has occurred at the candidate identification switch image patch based on at least one of the following conditions:
11. A device for post-processing in multi-target tracking, characterized by comprising:
12. The device according to Appendix 11, wherein determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set comprises:
13. The device according to Appendix 12, wherein in a case where it is determined that a special feature similarity less than a predetermined similarity threshold is present in the feature similarities, it is determined that a candidate identification switch image patch is present in the tracklet; and
14. The device according to Appendix 11, wherein determining whether a candidate identification switch image patch is present in the tracklet based on feature similarities of a plurality of re-identification feature pairs in the re-identification feature set comprises:
15. The device according to Appendix 14, wherein determining whether the candidate identification switch image patch is present in the tracklet based on the global similarity matrix comprises:
16. The device according to Appendix 15, wherein the elements in the local area corresponding to the image patch comprise a central element corresponding to the image patch on a main diagonal of the global similarity matrix and adjacent elements of the central element.
17. The device according to Appendix 11, wherein the operations further comprise: updating the tracklet to each of the two tracklets respectively, and continuing to making attempts to split a current tracklet.
18. The device according to Appendix 17, wherein when the number of times of splitting exceeds a predetermined number threshold of times, making attempts to split the current tracklet is stopped; and
19. The device according to Appendix 11, wherein it is verified whether it is credible that identification-switch has occurred at the candidate identification switch image patch based on at least one of the following conditions:
20. A non-transitory computer-readable storage medium having a program stored thereon, wherein the program, when executed, causes a computer to making attempts to split a tracklet indicative of a trajectory of a single target by performing operations of:
Number | Date | Country | Kind |
---|---|---|---|
202210887240.2 | Jul 2022 | CN | national |