Occluded pedestrian re-identification method based on pose estimation and background suppression

Information

  • Patent Grant
  • 11908222
  • Patent Number
    11,908,222
  • Date Filed
    Tuesday, October 17, 2023
    6 months ago
  • Date Issued
    Tuesday, February 20, 2024
    2 months ago
Abstract
The present application relates to an occluded pedestrian re-identification method, including steps of obtaining global features and local features of occluded pedestrians, and recombining the local features into a local feature map; obtaining a heat map of key-points of pedestrian images and a group of key-point confidences, obtaining a group of features of the pedestrian key-points by using the local feature map and the heat map; obtaining a local feature group by using the global features to enhance each key-point feature in the group of features of pedestrian key-points according to Conv, and an adjacency matrix of key-points is obtained through the key-points, the local feature group and the adjacency matrix of key-points are used as the input of GCN to obtain the final features of pedestrian key-points.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of Chinese Patent Application No. 202211593464.9, filed on Dec. 13, 2022, which is hereby incorporated by reference in its entirety.


TECHNICAL FIELD

The present invention relates to occlusion pedestrian re-identification technology and belongs to the field of computer vision, more particularly to an occlusion pedestrian re-identification method based on pose estimation and background suppression specifically.


BACKGROUND

Pedestrian re-identification, as one of the important research topics in the field of computer vision, aims to correlate pedestrian images captured by different physical locations and different cameras to achieve pedestrian recognition and retrieval across cameras and scenes, which is widely used in smart business, intelligent security and other fields. However, in the real scene, the pedestrian images captured by the camera are often blocked by objects or other pedestrians, so the robust feature expression of pedestrians cannot be extracted.


The existing occluded person re-identification methods have achieved relatively good results, but they are still plagued by problems caused by occlusion: the features of the unoccluded part of the pedestrians is the key to the network identification of pedestrians, and if the occlusion feature is introduced into the model, the recognition ability will be reduced; matching the local pedestrian features can effectively improve the recognition ability of the model, but the occlusion will lead to the misalignment of the local pedestrian features, resulting in the wrong matching of local features. At the same time, the attention mechanism can assign weights to the visible part of the human body, which can effectively reduce the negative impact of the cluttered background.


Based on the above, the present invention proposes an occluded pedestrian re-identification method based on pose estimation and background suppression.


SUMMARY

The purpose of this invention is to propose an occluded pedestrian re-identification method based on pose estimation and background suppression for the shortcomings of existing technologies. Firstly, by constructing a graph convolutional module of local feature enhancement, we aim to embed the context information contained in the global feature into the local feature to enhance the expression of local features and obtain the connection between each feature node. At the same time, the heat map of the key-points of the pedestrians and the feature map of overall pedestrians obtained by the pedestrian pose estimation module are used as two inputs of the attention-guided background suppression module to further focus the model on the visible part of pedestrians, so as to obtain the more robust feature expression of pedestrians.


The technical solutions adopted by the invention to solve the technical problems are as follows:


An occluded pedestrian re-identification method based on pose estimation and background suppression is characterized by including the following steps:


Step (1) Construct a pedestrian feature extraction backbone network based on ViT(Visual-Transformer) to obtain the global features ƒcls and the local features ƒƒ_local of occluded pedestrians, and recombinant the local features ƒƒ_local into the local feature map ƒlocal.


Step (2) Obtain the heat map of the key-points of the pedestrian images ƒpos and the group of key-point confidences Vkc by the pre-trained ViTPose (Human Pose Estimation), and then obtain the group of features of pedestrian key-points ƒkeypoints by using the local feature map ƒlocal obtained in step(1) and the heat map ƒpos.


Step (3) Construct a graph convolutional module of local feature enhancement composed of Conv and GCN, and then through Conv, obtain the local feature group ƒkp_en by using the global features ƒcls to enhance the each key-point feature in the group of features of pedestrian key-points ƒkeypoints and the adjacency matrix of key-points A is obtained through the key-points, and finally the local feature group ƒkp_en and the adjacency matrix of key-points A are used as the input of GCN to obtain the final features of pedestrian key-points ƒƒ_keypoints.


Step (4) Construct an attention background suppression module composed of global average pooling and convolutional networks, and then the local feature map ƒlocal obtained by step(1) and the heat map ƒpos obtained by step(2) are input into the attention background suppression module to obtain the output pedestrian features ƒatt_local, which are segmented as the final features.


Step (5) Construct an occluded pedestrian re-identification model(ReID) by the pedestrian feature extraction backbone network, ViTPose, the graph convolutional module of local feature enhancement and attention background suppression module, and then train the module using the global pedestrian features ƒcls in step(1), the features of pedestrian key-points ƒƒ_keypoints in step(3) and the pedestrian features ƒatt_local in step(4) to obtain the final occluded pedestrian re-identification model.


The beneficial effects of the invention are as follows:


The invention designs a graph convolutional module of local feature enhancement, which uses the context information of global features to enhance the local feature expression of pedestrians and obtain the feature connection between each key-point of pedestrians. This is beneficial for the model to learn the features of the unoccluded pedestrian parts and realize the alignment of features between the pedestrian parts, thereby improving the recognition of the pedestrian features. Secondly, in order to reduce the influence of background information of pedestrian images, the invention designs an attention-guided background suppression module, which guides the model to pay more attention to the distinguishing features related to pedestrian features, so as to obtain more discriminative pedestrian features. The results show that the features extracted by this method have better robustness and effectively improve the generalization ability of the model.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is the flow chart of the overall implementation scheme of the invention.



FIG. 2 is the schematic diagram of the backbone network of pedestrian feature extraction of the invention.



FIG. 3 is the schematic diagram of pedestrian pose estimation of the invention.



FIG. 4 is the schematic diagram of the convolution module of the local feature enhancement of the invention.



FIG. 5 is the schematic diagram of the attention-guided background suppression module of the invention.



FIG. 6 is the overall structure diagram of the invention.





DETAILED DESCRIPTIONS

The following is a further description of the invention in combination with the attached figures.



FIG. 1 is the flow chart of the overall implementation scheme of the invention, illustrating an occluded pedestrian re-identification method based on pose estimation and background suppression, as shown in FIG. 1, including the following steps:


Step (1) Construct a pedestrian feature extraction backbone network based on Visual-Transformer to obtain the global features ƒcls and the local features ƒƒ_local of occluded pedestrians, and recombinant the local features ƒƒ_local into the local feature map ƒlocal;


Step (2) Obtain the heat map of the key-points of the pedestrian images ƒpos and the group of key-point confidences Vkc by the pre-trained pedestrian pose estimation module, and then obtain the group of features of pedestrian key-points ƒkeypoints by using the local feature map ƒlocal obtained in step(1) and the heat map ƒpos;


Step (3) Construct a graph convolutional module of local feature enhancement, and then use the global features ƒcls to enhance each key-point feature in the group of features of pedestrian key-points ƒkeypoints. The enhanced group of features and the adjacency matrix of key-points A are used as the input of the graph convolutional network to obtain the final features of pedestrian key-points ƒƒ_keypoints;


Step (4) Construct an attention-guided background suppression module, and then the local feature map ƒlocal obtained by step (1) and the heat map ƒpos obtained by step (2) are input into the attention background suppression module to obtain the output pedestrian features ƒatt_local, which are segmented as the final features;


Step (5) Train the module using the global pedestrian features ƒcls in step (1), the features of pedestrian key-points ƒƒ_keypoints in step (3) and the pedestrian features ƒatt_local in step(4) to obtain the final occluded pedestrian re-identification model.


Further, the specific implementation process of step (1) is as follows:


1-1 Use the pre-trained Visual-Transformer (ViT) on ImageNet as the backbone network to extract the pedestrian features in the image. Before the images are input into ViT, extract features from the images by a small convolutional network in order to deal with the problem of unstable ViT training., as shown in equation (1):

x=Conv(X)  (1)


here, X represents the pedestrian image, Conv represents the convolutional network, and x is the pedestrian features output by the convolutional network.


1-2 Refer to FIG. 2, the feature map sequence xp={xpi|i=1,2, . . . N} is generated by segmenting the pedestrian features x obtained in 1-1 according to the preset patch size p, among which N is a separable quantity, and then add the camera perspective information [CAM_VIEW] to xp the dimension of which is the same as xp, as shown in equation(2):

xp=xp+λ*Ecam_view  (2)


here, λ is a hyperparameter representing the weight of the camera perspective information, and Ecam_view is [CAM_VIEW] representing the camera perspective information.


Add [CLS_TOKEN] representing the global feature and the position information coding [POS_TOKEN] to xp, and then the feature Z can be obtained after linear coding, as shown in equation (3):

Z=[xcls;linear(xpi)]+Epos  (3)


here, xcls is the global feature vector [CLS_TOKEN]; linear(·) is a linear coding function; Epos is [POS_TOKEN] representing the spatial position.


1-3 Input the features Z obtained in 1-2 into ViT, which is stacked by multiple Transformer blocks, as shown in equations (4) and (5):

Zl′=MSA(LN(Zl−1))+Z1−1l=1 . . . L  (4)
Zl=MLP(LN(Zl′))+Zl′=1 . . . L  (5)


here, Zl represents the features of the output of the Transformer block in the l layer, Zl′ is the intermediate result in the Transformer block, and L is the total number of layers; MSA(·) is the multi-head attention, LN(·) is the layer normalization, and MLP(·) is the multilayer perceptron.


The network output result is the output feature of the last layer, which are the global features ƒcls ∈ R1×C and the local feature group ƒƒ_local. Rearrange the local feature group ƒƒ_local and obtain the local feature map ƒlocal, as shown in equation (6):

ƒlocal=reshape(ƒƒ_local)  (6)


here, reshape(·) is the rearrangement function.


Further, the specific implementation process of step (2) is as follows:


2-1 Refer to FIG. 3, extract the key-points of the pedestrian images by ViTPose which is pre-trained on the CoCo dataset. The heat map of the pedestrian key-points ƒpos and the coordinate confidences of key-points Vkc in the pedestrian images are obtained by ViTPose. In order to be used for pedestrian re-identification task, the final category of the model is selected for output and obtain the heat map of the pedestrian key-points ƒpos and key-point set Vkc={V1, V2, . . . , VS}, among which, Vs is the key-point of the human body obtained by the pedestrian key-point algorithm, as shown in equation(7):

ƒpos,Vkc=ViTPose(Image)  (7)


here, VS={kx, ky, kc}, kx, ky are the coordinates of key-points respectively, and kc is the key-point confidence; ƒpos is the heat map of key-points output by ViTPose;


2-2 Using the obtained local feature map ƒlocal obtained in 1-3 and the heat map of the key-points of the pedestrians obtained in 2-1 ƒpos, S local features of pedestrian key-points can be obtained by vector outer product and global average pooling, as shown in equation (8):

ƒkeypoints=GAPlocal⊗ƒpos)  (8)


here, GAP is the global average pooling; the group of features of pedestrian key-points ƒkeypoints ∈ Rs×c, S is the number of key-points and C is the number of feature channels.


Further, the specific implementation process of step (3) is as follows:


3-1 In the case of occlusion, the local features extracted in 2-2 often cannot represent the unoccluded part of the pedestrians robustly, while the context information contained in the global features can further enhance the expression of local features. Therefore, the relationship between global features and local features is used to further enrich local features. Firstly, the group of features of pedestrian key-points can be expressed as equation (9):

ƒkeypointskeypointsi|i=0,1, . . . ,S}  (9)


Secondly, apply 1*1 convolution to each key-point feature and global feature ƒcls, as shown in equation (10)(11):

ƒkp_conv=Conv1×1keypoints)  (10)
ƒcls conv=Conv1×1cls)  (11)


here, ƒkp_conv is the feature after convolution of each local feature, and ƒcls_conv is the feature after convolution of global features.


3-2 By using the features of key-points and global features obtained in 3-1, the local feature group of enhanced key-points ƒkp_en calculated by vector quantity product, softmax and addition, as shown in equations (12) and (13):

Vsim=Softmax(ƒkp_conv⊙ƒcls_conv)  (12)
ƒkp_en=Conv(ƒkeypoints+w*(ƒcls_conv+Vsimcls))  (13)


here, Convcustom character is the convolution operation; w is the learnable weight; Vsim is the similarity.


3-3 By using the predefined adjacency matrix of pedestrian key-points A and the local feature group ƒkp_en obtained in 3-2 as the input of the graph convolutional network, output the final features of pedestrian key-points ƒƒ_keypoints by the graph convolutional network, as shown in equation (14):

ƒƒ_keypoints=GCN(A,ƒkp_en)  (14)


here, GCN is the graph convolutional network, and λ is a predefined adjacency matrix of human key-points.


3-4 The process described in 3-1,3-2,3-3 constitute a graph convolutional module of local feature enhancement, with reference to FIG. 4.


Further, the specific implementation process of step (4) is as follows:


4-1 In order to guide the attention to further focus on the unoccluded part of the pedestrians, thereby suppressing the cluttered background, the global average pooling of the heat map of the key-points output by the pose estimation network is used as the features of pedestrian key-points, which is supplemented to the local features.


4-2 The local feature map output by backbone network is used as a graph structure, meaning there are H*W nodes in the graph and each node is a C-dimensional feature, and firstly input the local feature map ƒlocal into two 1*1 convolutional networks, and then transpose the output of one of the convolutional networks to construct the relationship between nodes, as shown in equation (15):

Ri,j=Conv(ƒlocal)T Conv(ƒlocal)  (15)


here, Ri,j is the matrix of relational feature, and Conv is a convolutional network.


4-3 The matrix of relationship feature Ri,j is used to obtain the features of spatial perception fs, of the corresponding relationship, and then embed the local feature map ƒlocal, the features of pedestrian key-points ƒpos in 4-1 and the features of spatial perception ƒsp into the link, as shown in equation(16) and(17):

ƒsp=Conv(Ri,j)  (16)
ƒconcat=Concat[Conv(ƒlocal),Conv(ƒsp),Conv(ƒpos)]  (17)


here, ƒsp are the features of spatial perception, Concat(·) is the channel link function, and ƒconcat are the connected feature vectors.


Input ƒconcat into a 1*1 convolutional network and Sigmod to obtain the spatial attention map fatten, and finally the final pedestrian feature map fatten is obtained by multiplying the spatial attention map ƒlocal with the local feature map ƒatt_locall.


4-4 After that, according to the pedestrian structure construct multiple classification heads, and the pedestrian feature map ƒatt_local is divided into four local features, ƒ1, ƒ2, ƒ3, ƒ4, to classify the pedestrian images.


4-5 The process described in 4-1, 4-2, 4-3, 4-4 constitutes the attention-guided background suppression module, with reference to FIG. 5.


Further, the specific implementation process of step (5) is as follows:


5-1 The labeled data in the pedestrian re-identification dataset is used as the supervision information, and use the ID loss which uses cross entropy loss to train the network and the difficult triplet loss to train the network for each training batch as shown in equation (18):










L

i

d


=







i
=
1

N

-


q
i



log

(

p
i

)







(
18
)







here, N is the number of pedestrian categories, qi is the supervised label and pi is the predictive label.


Difficult triplet loss randomly samples P identities, each which extracts K instances to form a mini batch with a size of P*K; each picture xa in the batch is selected as the anchor point in turn, and the farthest positive sample picture xp and the nearest negative sample picture xn in the batch are selected to form a triple to train the network in order to enhance the generalization ability of the network, as shown in equation(19):










L
triplet

=







i
=
1

P










a
=
1

K

[

m
+




max

1

p

K


||


f


(

x

i
,
a


)


-

f


(

x

i
,
p


)




||
2





hardest


positive


-




min






n
=

1





K







j
=

1





P






j


i


||


f

(

x

i
,
a


)

-

f

(

x

j
,
n


)



||
2





hardest


negative



]

+






(
19
)







5-2 Refer to FIG. 6, it is the overall architecture of the network. According to ƒ1, ƒ2, ƒ3, ƒ4 generated by the global features ƒcls in 1-1, the final group of features of pedestrian key-points ƒƒ_keypoints in 3-3 and the pedestrian features ƒatt_local in 4-4 division, train the occluded pedestrian re-identification model to obtain the final model, and the specific equation can be expressed as follows:









Loss
=



L
id

(

f
cls

)

+


L
triplet

(

f
cls

)

+


1
S








i
=
1

S




k
c
i

[



L
id

(

f
f_keypoints
i

)

+


L
triplet

(

f
f_keypoints
i

)


]


+


1
k








i
=
1

k



(



L
id

(

f
i

)

+


L
triplet

(

f
i

)


)







(
20
)







here, S is the number of pedestrian key-points, and kc is the key-point confidence obtained in 2-1.


5-3 When the model is stable, the final ReID model is obtained. In the test stage, the key-point features of the images to be queried q and the test set images t are obtained by inputting q and t into the final occluded pedestrian re-identification model for feature extraction, which are ƒq and ƒt respectively and then use graph matching optimization comparison, as shown in equation (21):









GM
=


1
S








i
=
1

S



k
c_q
i



k
c_t
i


cosine



(


f
q
i

,

f
t
i


)






(
21
)







here, kc_qi and kc_ti are the i-th key-point confidences of the image q and t respectively; cosine is the cosine distance.


5-4 Compare the query image features with the test set image features, whether the images belong to the same class, and output the same kind of pedestrian images.

Claims
  • 1. An occluded pedestrian re-identification method based on pose estimation and background suppression, comprising steps of: step (1): constructing a pedestrian feature extraction backbone network based on ViT to obtain global features ƒcls and local features ƒƒ_local of occluded pedestrians, and recombining the local features ƒƒ_local into a local feature map ƒlocal;step (2): obtaining a heat map of key-points of a pedestrian images ƒpos and a group of key-point confidences Vkc according to a pre-trained ViTPose, and then obtaining a group of features of pedestrian key-points ƒkeypoints by using the local feature map ƒlocal and the heat map ƒpos;step (3): constructing a graph convolutional module of local feature enhancement composed of Conv and GCN, and then obtaining a local feature group ƒkp_en by using the global features ƒcls to enhance each key-point feature in the group of features of pedestrian key-points ƒkeypoints according to through Conv, and obtaining an adjacency matrix of key-points A through the key-points, and finally the local feature group ƒkp_en and the adjacency matrix of key-points A are used as input of GCN to obtain final features of pedestrian key-points ƒƒ_keypoints;step (4): constructing an attention background suppression module composed of global average pooling and convolutional networks, inputting the local feature map ƒlocal obtained by step (1) and the heat map ƒpos obtained by step (2) into the attention background suppression module to output pedestrian features ƒatt_local, segmenting the pedestrian features as final features final features;step (5): constructing an occluded pedestrian re-identification model by the pedestrian feature extraction backbone network, the ViTPose, the graph convolutional module of local feature enhancement and the attention background suppression module, and then training the occluded pedestrian re-identification module by using the global pedestrian features ƒcls, the features of pedestrian key-points ƒƒ_keypoints and the pedestrian features ƒatt_local to obtain a final occluded pedestrian re-identification model.
  • 2. The method according to claim 1, wherein the pedestrian feature extraction backbone network is obtained based on an initial network of pedestrian feature extraction which adopting the ViT pre-trained on ImageNet, wherein before the images are input into ViT, extracting features from the images by a convolutional network, as shown in equation(1): x=Conv(X)  (1)wherein X represents the pedestrian image, Conv represents the convolutional network, and x is the pedestrian features output by the convolutional network.
  • 3. The method according to claim 2, wherein the step of obtaining the global features ƒcls and the local features ƒƒ_local of occluded pedestrians comprises: generating a feature map sequence xp={xpi|i=1,2, . . . N} by segmenting the obtained the pedestrian features x according to a preset patch size p, wherein N is a separable quantity, and then adding camera perspective information [CAM_VIEW] to xp, the dimension of camera perspective information is the same as xp, as shown in equation(2): xp=xp+λ*Ecam_view  (2)
  • 4. The method according to claim 1, wherein the acquisition of the local feature map ƒlocal comprises: rearranging the local feature group ƒƒ_local as follows: ƒlocal reshape(ƒƒ_local)  (6)wherein reshape(·) is the rearrangement function.
  • 5. The method according to claim 4, wherein the specific implementation process of step(2) comprises steps of step 2-1 to step 2-2: step 2-1: extracting the key-points of the pedestrian images by ViTPose which is pre-trained on the CoCo dataset, and the heat map of the pedestrian key-points ƒpos and key-point set Vkc={V1, V2, . . . , VS} in the pedestrian images are obtained by ViTPose, among which, VS is the key-point of the human body obtained by the pedestrian key-point algorithm, as shown in equation(7): ƒpos,Vkc=ViTPose(Image)  (7)wherein VS={kx, ky, kc},kx, ky are the coordinates of key-points respectively, and kc is the key-point confidence; ƒpos is the heat map of key-points output by ViTPosestep 2-2: obtaining S local features of pedestrian key-points by using the local feature map ƒlocal and the heat map ƒpos according to vector outer producting and global average pooling, as shown in equation (8): ƒkeypoints=GAP(ƒlocal⊗ƒpos)  (8)wherein GAP is the global average pooling; the group of features of pedestrian key-points ƒkeypoints ∈ RS×C S is the number of key-points and C is the number of feature channels.
  • 6. The method according to claim 5, wherein obtaining the local feature group ƒkp_en in step (3) comprises: firstly, the group of features of pedestrian key-points can be expressed as equation (9): ƒkeypoints={ƒkeypointsi|i=0,1, . . . ,S}  (9)secondly, apply 1*1 convolution to each key-point feature and the global feature ƒcls, as shown in equation (10) and equation (11): ƒkp_conv=Conv1×1(ƒkeypoints)  (10)ƒcls_conv=Conv1×1(ƒcls)  (11)wherein ƒkp_conv is the feature after convolution of each local feature, and ƒcls_conv is the feature after convolution of global features;lastly, by using the obtained group of features of pedestrian key-points ƒkeypoints and global features ƒcls, the local feature group of enhanced key-points ƒkp_en is calculated by vector quantity product, softmax and addition, as shown in equation (12) and equation (13): Vsim=Softmax(ƒkp_conv⊙ƒcls_conv)  (12)ƒkp_en=Conv(ƒkeypoints+w*(ƒcls_conv+Vsim*ƒcls))  (13)wherein Cony is the convolution operation; w is the learnable weight; Vsim is the similarity.
  • 7. The method according to claim 6, wherein the method of obtaining the final features of pedestrian key-points ƒƒ_keypoints in step (3) comprises: by using the adjacency matrix of pedestrian key-points A and the local feature group ƒkp_en as the input of the graph convolutional network, outputting the final features of pedestrian key-points ƒƒ_keypoints by the graph convolutional network, as shown in equation (14): ƒƒ_keypoints=GCN(A,ƒkp_en)  (14)wherein GCN is the graph convolutional network, and λ is a predefined adjacency matrix of human key-points.
  • 8. The method according to claim 7, wherein the specific implementation process of step (4) comprises steps of step 4-1 to step 4-4: step 4-1: performing global average pooling to the heat map of the key-points ƒpos to obtain the features of pedestrian key-points, and supplementing the features of pedestrian key-points to the local features ƒƒ_local;step 4-2: using the local feature map ƒlocal as a graph structure, wherein the graph comprises H*W nodes and each node is a C-dimensional feature, inputting the local feature map ƒlocal into two 1*1 convolutional networks, and then transposing the output of one of the convolutional networks to construct the relationship between nodes, as shown in equation (15): Ri,j=Conv(ƒlocal)TConv(ƒlocal)  (15)wherein Ri,j is the matrix of relational feature, and Conv is a convolutional network;step 4-3: obtaining the features of spatial perception fs, of the corresponding relationship by using the matrix of relationship feature Ri,j, and then embedding the local feature map ƒlocal, the features of pedestrian key-points ƒpos and the features of spatial perception fs, into a link, as shown in equation (16) and equation (17): ƒsp=Conv(Ri,j)  (16)ƒconcat=Concat[Conv(ƒlocal),Conv(ƒsp), Conv(ƒpos)] (17) wherein ƒsp are the features of spatial perception, Concat(·) is the channel link function, and ƒconcat are the connected feature vectors;inputting the ƒconcat into a 1*1 convolutional network and Sigmod to obtain a spatial attention map fatten, and finally a final pedestrian feature map ƒatt_local is obtained by multiplying the spatial attention map fatten with the local feature map ƒlocal;step 4-4: constructing multiple classification heads according to the pedestrian structure, and dividing the pedestrian feature map ƒatt_local into four local features, ƒ1, ƒ2, ƒ3, ƒ4, to classify the pedestrian images.
  • 9. The method according to claim 8, wherein the specific implementation process of step (5) comprises steps of step 5-1 to step 5-4: step 5-1: using labeled data in the pedestrian re-identification dataset as supervision information, and using ID loss and difficult triplet loss to train the network for each training batch as shown in equation (18), wherein the ID loss uses cross entropy loss:
Priority Claims (1)
Number Date Country Kind
202211593464.9 Dec 2022 CN national
US Referenced Citations (4)
Number Name Date Kind
10657364 El-Khamy May 2020 B2
11699290 Wang Jul 2023 B1
11835951 Djuric Dec 2023 B2
20220066544 Kwon Mar 2022 A1
Foreign Referenced Citations (8)
Number Date Country
113128461 Jul 2021 CN
113361334 Sep 2021 CN
114120363 Mar 2022 CN
115050048 Sep 2022 CN
115311619 Nov 2022 CN
115497122 Dec 2022 CN
2022174707 Nov 2022 JP
2022236668 Nov 2022 WO
Non-Patent Literature Citations (12)
Entry
Ma et al. “Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification”, MM '21, Oct. 20-24, 2021 (Year: 2021).
Wang et al. “Key point-aware occlusion suppression and semantic alignment for occluded person re-identification”, Information Sciences 606 (2022) 669-687 (Year: 2022).
Liu et al. “Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer”, The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) (Year: 2022).
Xu et al. “Learning Feature Recovery Transformer for Occluded Person Re-Identification”, IEEE Transactions on Image Processing, vol. 31, 2022 (Year: 2022).
Li et al. “Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer”, CVPR 2021 (Year: 2021).
Zhou et al. “Motion-Aware Transformer For Occluded Person Re-identification”, 2202.04243v2 [cs.CV] Feb. 10, 2022 (Year: 2022).
Zheng et al. “Person Re-identification: Past, Present and Future”, Journal of Latex Class Files, vol. 14, No. 8, Aug. 2015 (Year: 2015).
Zhao et al. “Short range correlation transformer for occluded person reidentification”, Neural Computing and Applications (2022) 34: 17633-17645 (Year: 2022).
Liu et al. “Survey for person re-identification based on coarse-to-fine feature learning”, Multimedia Tools and Applications (2022) ( Year: 2022).
Jiao Long, “Design and Implementation of Pedestrian Re-identification for Security Monitoring”, China Master's Theses Full-text Database Information Technology Series, Sep. 30, 2021.
Han Zhiwei,, “Person Re-Identification in the Wild”, China Master's Theses Full-text Database Information Technology Series, Jul. 31, 2022.
Shuren Zhou et al., “Occluded person re-identification based on embedded graph matching network for contrastive feature relation”, Theoretical Advances, Nov. 18, 2022.