Occluded pedestrian re-identification method based on pose estimation and background suppression

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of Chinese Patent Application No. 202211593464.9, filed on Dec. 13, 2022, which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The present invention relates to occlusion pedestrian re-identification technology and belongs to the field of computer vision, more particularly to an occlusion pedestrian re-identification method based on pose estimation and background suppression specifically.

BACKGROUND

Pedestrian re-identification, as one of the important research topics in the field of computer vision, aims to correlate pedestrian images captured by different physical locations and different cameras to achieve pedestrian recognition and retrieval across cameras and scenes, which is widely used in smart business, intelligent security and other fields. However, in the real scene, the pedestrian images captured by the camera are often blocked by objects or other pedestrians, so the robust feature expression of pedestrians cannot be extracted.

The existing occluded person re-identification methods have achieved relatively good results, but they are still plagued by problems caused by occlusion: the features of the unoccluded part of the pedestrians is the key to the network identification of pedestrians, and if the occlusion feature is introduced into the model, the recognition ability will be reduced; matching the local pedestrian features can effectively improve the recognition ability of the model, but the occlusion will lead to the misalignment of the local pedestrian features, resulting in the wrong matching of local features. At the same time, the attention mechanism can assign weights to the visible part of the human body, which can effectively reduce the negative impact of the cluttered background.

Based on the above, the present invention proposes an occluded pedestrian re-identification method based on pose estimation and background suppression.

SUMMARY

The purpose of this invention is to propose an occluded pedestrian re-identification method based on pose estimation and background suppression for the shortcomings of existing technologies. Firstly, by constructing a graph convolutional module of local feature enhancement, we aim to embed the context information contained in the global feature into the local feature to enhance the expression of local features and obtain the connection between each feature node. At the same time, the heat map of the key-points of the pedestrians and the feature map of overall pedestrians obtained by the pedestrian pose estimation module are used as two inputs of the attention-guided background suppression module to further focus the model on the visible part of pedestrians, so as to obtain the more robust feature expression of pedestrians.

The technical solutions adopted by the invention to solve the technical problems are as follows:

An occluded pedestrian re-identification method based on pose estimation and background suppression is characterized by including the following steps:

Step (1) Construct a pedestrian feature extraction backbone network based on ViT(Visual-Transformer) to obtain the global features ƒ_clsand the local features ƒ_{ƒ_local}of occluded pedestrians, and recombinant the local features ƒ_{ƒ_local}into the local feature map ƒ_local.

Step (2) Obtain the heat map of the key-points of the pedestrian images ƒ_posand the group of key-point confidences V_kcby the pre-trained ViTPose (Human Pose Estimation), and then obtain the group of features of pedestrian key-points ƒ_keypointsby using the local feature map ƒ_localobtained in step(1) and the heat map ƒ_pos.

Step (3) Construct a graph convolutional module of local feature enhancement composed of Conv and GCN, and then through Conv, obtain the local feature group ƒ_{kp_en}by using the global features ƒ_clsto enhance the each key-point feature in the group of features of pedestrian key-points ƒ_keypointsand the adjacency matrix of key-points A is obtained through the key-points, and finally the local feature group ƒ_{kp_en}and the adjacency matrix of key-points A are used as the input of GCN to obtain the final features of pedestrian key-points ƒ_{ƒ_keypoints}.

Step (4) Construct an attention background suppression module composed of global average pooling and convolutional networks, and then the local feature map ƒ_localobtained by step(1) and the heat map ƒ_posobtained by step(2) are input into the attention background suppression module to obtain the output pedestrian features ƒ_{att_local}, which are segmented as the final features.

Step (5) Construct an occluded pedestrian re-identification model(ReID) by the pedestrian feature extraction backbone network, ViTPose, the graph convolutional module of local feature enhancement and attention background suppression module, and then train the module using the global pedestrian features ƒ_clsin step(1), the features of pedestrian key-points ƒ_{ƒ_keypoints}in step(3) and the pedestrian features ƒ_{att_local}in step(4) to obtain the final occluded pedestrian re-identification model.

The beneficial effects of the invention are as follows:

The invention designs a graph convolutional module of local feature enhancement, which uses the context information of global features to enhance the local feature expression of pedestrians and obtain the feature connection between each key-point of pedestrians. This is beneficial for the model to learn the features of the unoccluded pedestrian parts and realize the alignment of features between the pedestrian parts, thereby improving the recognition of the pedestrian features. Secondly, in order to reduce the influence of background information of pedestrian images, the invention designs an attention-guided background suppression module, which guides the model to pay more attention to the distinguishing features related to pedestrian features, so as to obtain more discriminative pedestrian features. The results show that the features extracted by this method have better robustness and effectively improve the generalization ability of the model.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is the flow chart of the overall implementation scheme of the invention.

FIG. 2 is the schematic diagram of the backbone network of pedestrian feature extraction of the invention.

FIG. 3 is the schematic diagram of pedestrian pose estimation of the invention.

FIG. 4 is the schematic diagram of the convolution module of the local feature enhancement of the invention.

FIG. 5 is the schematic diagram of the attention-guided background suppression module of the invention.

FIG. 6 is the overall structure diagram of the invention.

DETAILED DESCRIPTIONS

The following is a further description of the invention in combination with the attached figures.

FIG. 1 is the flow chart of the overall implementation scheme of the invention, illustrating an occluded pedestrian re-identification method based on pose estimation and background suppression, as shown in FIG. 1, including the following steps:

Step (1) Construct a pedestrian feature extraction backbone network based on Visual-Transformer to obtain the global features ƒ_clsand the local features ƒ_{ƒ_local}of occluded pedestrians, and recombinant the local features ƒ_{ƒ_local}into the local feature map ƒ_local;

Step (2) Obtain the heat map of the key-points of the pedestrian images ƒ_posand the group of key-point confidences V_kcby the pre-trained pedestrian pose estimation module, and then obtain the group of features of pedestrian key-points ƒ_keypointsby using the local feature map ƒ_localobtained in step(1) and the heat map ƒ_pos;

Step (3) Construct a graph convolutional module of local feature enhancement, and then use the global features ƒ_clsto enhance each key-point feature in the group of features of pedestrian key-points ƒ_keypoints. The enhanced group of features and the adjacency matrix of key-points A are used as the input of the graph convolutional network to obtain the final features of pedestrian key-points ƒ_{ƒ_keypoints};

Step (4) Construct an attention-guided background suppression module, and then the local feature map ƒ_localobtained by step (1) and the heat map ƒ_posobtained by step (2) are input into the attention background suppression module to obtain the output pedestrian features ƒ_{att_local}, which are segmented as the final features;

Step (5) Train the module using the global pedestrian features ƒ_clsin step (1), the features of pedestrian key-points ƒ_{ƒ_keypoints}in step (3) and the pedestrian features ƒ_{att_local}in step(4) to obtain the final occluded pedestrian re-identification model.

Further, the specific implementation process of step (1) is as follows:

1-1 Use the pre-trained Visual-Transformer (ViT) on ImageNet as the backbone network to extract the pedestrian features in the image. Before the images are input into ViT, extract features from the images by a small convolutional network in order to deal with the problem of unstable ViT training., as shown in equation (1):

x=Conv(X) (1)

here, X represents the pedestrian image, Conv represents the convolutional network, and x is the pedestrian features output by the convolutional network.

1-2 Refer to FIG. 2, the feature map sequence x_p={x_pⁱ|i=1,2, . . . N} is generated by segmenting the pedestrian features x obtained in 1-1 according to the preset patch size p, among which N is a separable quantity, and then add the camera perspective information [CAM_VIEW] to x_pthe dimension of which is the same as x_p, as shown in equation(2):

x_p=x_p+λ*E_{cam_view} (2)

here, λ is a hyperparameter representing the weight of the camera perspective information, and E_{cam_view}is [CAM_VIEW] representing the camera perspective information.

Add [CLS_TOKEN] representing the global feature and the position information coding [POS_TOKEN] to x_p, and then the feature Z can be obtained after linear coding, as shown in equation (3):

Z=[x_cls;linear(x_pⁱ)]+E_pos (3)

here, x_clsis the global feature vector [CLS_TOKEN]; linear(·) is a linear coding function; E_posis [POS_TOKEN] representing the spatial position.

1-3 Input the features Z obtained in 1-2 into ViT, which is stacked by multiple Transformer blocks, as shown in equations (4) and (5):

Z_l′=MSA(LN(Z_l−1))+Z₁−1l=1 . . . L (4)
Z_l=MLP(LN(Z_l′))+Z_l′=1 . . . L (5)

here, Z_lrepresents the features of the output of the Transformer block in the l layer, Z_l′ is the intermediate result in the Transformer block, and L is the total number of layers; MSA(·) is the multi-head attention, LN(·) is the layer normalization, and MLP(·) is the multilayer perceptron.

The network output result is the output feature of the last layer, which are the global features ƒ_cls∈ R^1×Cand the local feature group ƒ_{ƒ_local}. Rearrange the local feature group ƒ_{ƒ_local}and obtain the local feature map ƒ_local, as shown in equation (6):

ƒ_local=reshape(ƒ_{ƒ_local}) (6)

here, reshape(·) is the rearrangement function.

Further, the specific implementation process of step (2) is as follows:

2-1 Refer to FIG. 3, extract the key-points of the pedestrian images by ViTPose which is pre-trained on the CoCo dataset. The heat map of the pedestrian key-points ƒ_posand the coordinate confidences of key-points V_kcin the pedestrian images are obtained by ViTPose. In order to be used for pedestrian re-identification task, the final category of the model is selected for output and obtain the heat map of the pedestrian key-points ƒ_posand key-point set V_kc={V₁, V₂, . . . , V_S}, among which, V_sis the key-point of the human body obtained by the pedestrian key-point algorithm, as shown in equation(7):

ƒ_pos,V_kc=ViTPose(Image) (7)

here, V_S={k_x, k_y, k_c}, k_x, k_yare the coordinates of key-points respectively, and k_cis the key-point confidence; ƒ_posis the heat map of key-points output by ViTPose;

2-2 Using the obtained local feature map ƒ_localobtained in 1-3 and the heat map of the key-points of the pedestrians obtained in 2-1 ƒ_pos, S local features of pedestrian key-points can be obtained by vector outer product and global average pooling, as shown in equation (8):

ƒ_keypoints=GAP(ƒ_local⊗ƒ_pos) (8)

here, GAP is the global average pooling; the group of features of pedestrian key-points ƒ_keypoints∈ R^s×c, S is the number of key-points and C is the number of feature channels.

Further, the specific implementation process of step (3) is as follows:

3-1 In the case of occlusion, the local features extracted in 2-2 often cannot represent the unoccluded part of the pedestrians robustly, while the context information contained in the global features can further enhance the expression of local features. Therefore, the relationship between global features and local features is used to further enrich local features. Firstly, the group of features of pedestrian key-points can be expressed as equation (9):

ƒ_keypoints{ƒ_keypointsⁱ|i=0,1, . . . ,S} (9)

Secondly, apply 1*1 convolution to each key-point feature and global feature ƒ_cls, as shown in equation (10)(11):

ƒ_{kp_conv}=Conv_1×1(ƒ_keypoints) (10)
ƒ_clsconv=Conv_1×1(ƒ_cls) (11)

here, ƒ_{kp_conv}is the feature after convolution of each local feature, and ƒ_{cls_conv}is the feature after convolution of global features.

3-2 By using the features of key-points and global features obtained in 3-1, the local feature group of enhanced key-points ƒ_{kp_en}calculated by vector quantity product, softmax and addition, as shown in equations (12) and (13):

V_sim=Softmax(ƒ_{kp_conv}⊙ƒ_{cls_conv}) (12)
ƒ_{kp_en}=Conv(ƒ_keypoints+w*(ƒ_{cls_conv}+V_sim*ƒ_cls)) (13)

here, Conv custom character is the convolution operation; w is the learnable weight; V_simis the similarity.

3-3 By using the predefined adjacency matrix of pedestrian key-points A and the local feature group ƒ_{kp_en}obtained in 3-2 as the input of the graph convolutional network, output the final features of pedestrian key-points ƒ_{ƒ_keypoints}by the graph convolutional network, as shown in equation (14):

ƒ_{ƒ_keypoints}=GCN(A,ƒ_{kp_en}) (14)

here, GCN is the graph convolutional network, and λ is a predefined adjacency matrix of human key-points.

3-4 The process described in 3-1,3-2,3-3 constitute a graph convolutional module of local feature enhancement, with reference to FIG. 4.

Further, the specific implementation process of step (4) is as follows:

4-1 In order to guide the attention to further focus on the unoccluded part of the pedestrians, thereby suppressing the cluttered background, the global average pooling of the heat map of the key-points output by the pose estimation network is used as the features of pedestrian key-points, which is supplemented to the local features.

4-2 The local feature map output by backbone network is used as a graph structure, meaning there are H*W nodes in the graph and each node is a C-dimensional feature, and firstly input the local feature map ƒ_localinto two 1*1 convolutional networks, and then transpose the output of one of the convolutional networks to construct the relationship between nodes, as shown in equation (15):

R_i,j=Conv(ƒ_local)^TConv(ƒ_local) (15)

here, R_i,jis the matrix of relational feature, and Conv is a convolutional network.

4-3 The matrix of relationship feature R_i,jis used to obtain the features of spatial perception fs, of the corresponding relationship, and then embed the local feature map ƒ_local, the features of pedestrian key-points ƒ_posin 4-1 and the features of spatial perception ƒ_spinto the link, as shown in equation(16) and(17):

ƒ_sp=Conv(R_i,j) (16)
ƒ_concat=Concat[Conv(ƒ_local),Conv(ƒ_sp),Conv(ƒ_pos)] (17)

here, ƒ_spare the features of spatial perception, Concat(·) is the channel link function, and ƒ_concatare the connected feature vectors.

Input ƒ_concatinto a 1*1 convolutional network and Sigmod to obtain the spatial attention map fatten, and finally the final pedestrian feature map fatten is obtained by multiplying the spatial attention map ƒ_localwith the local feature map ƒ_{att_locall.}

4-4 After that, according to the pedestrian structure construct multiple classification heads, and the pedestrian feature map ƒ_{att_local}is divided into four local features, ƒ₁, ƒ₂, ƒ₃, ƒ₄, to classify the pedestrian images.

4-5 The process described in 4-1, 4-2, 4-3, 4-4 constitutes the attention-guided background suppression module, with reference to FIG. 5.

Further, the specific implementation process of step (5) is as follows:

5-1 The labeled data in the pedestrian re-identification dataset is used as the supervision information, and use the ID loss which uses cross entropy loss to train the network and the difficult triplet loss to train the network for each training batch as shown in equation (18):

$\begin{matrix} L_{i d} = \sum_{i = 1}^{N} - q_{i} \log (p_{i}) & (18) \end{matrix}$

here, N is the number of pedestrian categories, q_iis the supervised label and p_iis the predictive label.

Difficult triplet loss randomly samples P identities, each which extracts K instances to form a mini batch with a size of P*K; each picture x_ain the batch is selected as the anchor point in turn, and the farthest positive sample picture x_pand the nearest negative sample picture x_nin the batch are selected to form a triple to train the network in order to enhance the generalization ability of the network, as shown in equation(19):

$\begin{matrix} L_{triplet} = \sum_{i = 1}^{P} {\sum_{a = 1}^{K} [m + \overset{hardest positive}{\overset{︷}{\max_{1 \leq p \leq K} || f (x_{i, a}) - f (x_{i, p}) {||}_{2}}} - \underset{hardest negative}{\underset{︸}{\min_{\begin{matrix} n = 1 \dots K \\ j = 1 \dots P \end{matrix} j \neq i} || f (x_{i, a}) - f (x_{j, n}) {||}_{2}}}]}_{+} & (19) \end{matrix}$

5-2 Refer to FIG. 6, it is the overall architecture of the network. According to ƒ₁, ƒ₂, ƒ₃, ƒ₄generated by the global features ƒ_clsin 1-1, the final group of features of pedestrian key-points ƒ_{ƒ_keypoints}in 3-3 and the pedestrian features ƒ_{att_local}in 4-4 division, train the occluded pedestrian re-identification model to obtain the final model, and the specific equation can be expressed as follows:

$\begin{matrix} Loss = L_{id} (f_{cls}) + L_{triplet} (f_{cls}) + \frac{1}{S} \sum_{i = 1}^{S} k_{c}^{i} [L_{id} (f_{f_keypoints}^{i}) + L_{triplet} (f_{f_keypoints}^{i})] + \frac{1}{k} \sum_{i = 1}^{k} (L_{id} (f_{i}) + L_{triplet} (f_{i})) & (20) \end{matrix}$

here, S is the number of pedestrian key-points, and k_cis the key-point confidence obtained in 2-1.

5-3 When the model is stable, the final ReID model is obtained. In the test stage, the key-point features of the images to be queried q and the test set images t are obtained by inputting q and t into the final occluded pedestrian re-identification model for feature extraction, which are ƒ_qand ƒ_trespectively and then use graph matching optimization comparison, as shown in equation (21):

$\begin{matrix} GM = \frac{1}{S} \sum_{i = 1}^{S} k_{c_q}^{i} k_{c_t}^{i} cosine (f_{q}^{i}, f_{t}^{i}) & (21) \end{matrix}$

here, k_{c_q}ⁱand k_{c_t}ⁱare the i-th key-point confidences of the image q and t respectively; cosine is the cosine distance.

5-4 Compare the query image features with the test set image features, whether the images belong to the same class, and output the same kind of pedestrian images.

Claims

1. An occluded pedestrian re-identification method based on pose estimation and background suppression, comprising steps of: step (1): constructing a pedestrian feature extraction backbone network based on ViT to obtain global features ƒcls and local features ƒƒ_local of occluded pedestrians, and recombining the local features ƒƒ_local into a local feature map ƒlocal;step (2): obtaining a heat map of key-points of a pedestrian images ƒpos and a group of key-point confidences Vkc according to a pre-trained ViTPose, and then obtaining a group of features of pedestrian key-points ƒkeypoints by using the local feature map ƒlocal and the heat map ƒpos;step (3): constructing a graph convolutional module of local feature enhancement composed of Conv and GCN, and then obtaining a local feature group ƒkp_en by using the global features ƒcls to enhance each key-point feature in the group of features of pedestrian key-points ƒkeypoints according to through Conv, and obtaining an adjacency matrix of key-points A through the key-points, and finally the local feature group ƒkp_en and the adjacency matrix of key-points A are used as input of GCN to obtain final features of pedestrian key-points ƒƒ_keypoints;step (4): constructing an attention background suppression module composed of global average pooling and convolutional networks, inputting the local feature map ƒlocal obtained by step (1) and the heat map ƒpos obtained by step (2) into the attention background suppression module to output pedestrian features ƒatt_local, segmenting the pedestrian features as final features final features;step (5): constructing an occluded pedestrian re-identification model by the pedestrian feature extraction backbone network, the ViTPose, the graph convolutional module of local feature enhancement and the attention background suppression module, and then training the occluded pedestrian re-identification module by using the global pedestrian features ƒcls, the features of pedestrian key-points ƒƒ_keypoints and the pedestrian features ƒatt_local to obtain a final occluded pedestrian re-identification model.
2. The method according to claim 1, wherein the pedestrian feature extraction backbone network is obtained based on an initial network of pedestrian feature extraction which adopting the ViT pre-trained on ImageNet, wherein before the images are input into ViT, extracting features from the images by a convolutional network, as shown in equation(1): x=Conv(X) (1)wherein X represents the pedestrian image, Conv represents the convolutional network, and x is the pedestrian features output by the convolutional network.
3. The method according to claim 2, wherein the step of obtaining the global features ƒcls and the local features ƒƒ_local of occluded pedestrians comprises: generating a feature map sequence xp={xpi|i=1,2, . . . N} by segmenting the obtained the pedestrian features x according to a preset patch size p, wherein N is a separable quantity, and then adding camera perspective information [CAM_VIEW] to xp, the dimension of camera perspective information is the same as xp, as shown in equation(2): xp=xp+λ*Ecam_view (2)
4. The method according to claim 1, wherein the acquisition of the local feature map ƒlocal comprises: rearranging the local feature group ƒƒ_local as follows: ƒlocal reshape(ƒƒ_local) (6)wherein reshape(·) is the rearrangement function.
5. The method according to claim 4, wherein the specific implementation process of step(2) comprises steps of step 2-1 to step 2-2: step 2-1: extracting the key-points of the pedestrian images by ViTPose which is pre-trained on the CoCo dataset, and the heat map of the pedestrian key-points ƒpos and key-point set Vkc={V1, V2, . . . , VS} in the pedestrian images are obtained by ViTPose, among which, VS is the key-point of the human body obtained by the pedestrian key-point algorithm, as shown in equation(7): ƒpos,Vkc=ViTPose(Image) (7)wherein VS={kx, ky, kc},kx, ky are the coordinates of key-points respectively, and kc is the key-point confidence; ƒpos is the heat map of key-points output by ViTPosestep 2-2: obtaining S local features of pedestrian key-points by using the local feature map ƒlocal and the heat map ƒpos according to vector outer producting and global average pooling, as shown in equation (8): ƒkeypoints=GAP(ƒlocal⊗ƒpos) (8)wherein GAP is the global average pooling; the group of features of pedestrian key-points ƒkeypoints ∈ RS×C S is the number of key-points and C is the number of feature channels.
6. The method according to claim 5, wherein obtaining the local feature group ƒkp_en in step (3) comprises: firstly, the group of features of pedestrian key-points can be expressed as equation (9): ƒkeypoints={ƒkeypointsi|i=0,1, . . . ,S} (9)secondly, apply 1*1 convolution to each key-point feature and the global feature ƒcls, as shown in equation (10) and equation (11): ƒkp_conv=Conv1×1(ƒkeypoints) (10)ƒcls_conv=Conv1×1(ƒcls) (11)wherein ƒkp_conv is the feature after convolution of each local feature, and ƒcls_conv is the feature after convolution of global features;lastly, by using the obtained group of features of pedestrian key-points ƒkeypoints and global features ƒcls, the local feature group of enhanced key-points ƒkp_en is calculated by vector quantity product, softmax and addition, as shown in equation (12) and equation (13): Vsim=Softmax(ƒkp_conv⊙ƒcls_conv) (12)ƒkp_en=Conv(ƒkeypoints+w*(ƒcls_conv+Vsim*ƒcls)) (13)wherein Cony is the convolution operation; w is the learnable weight; Vsim is the similarity.
7. The method according to claim 6, wherein the method of obtaining the final features of pedestrian key-points ƒƒ_keypoints in step (3) comprises: by using the adjacency matrix of pedestrian key-points A and the local feature group ƒkp_en as the input of the graph convolutional network, outputting the final features of pedestrian key-points ƒƒ_keypoints by the graph convolutional network, as shown in equation (14): ƒƒ_keypoints=GCN(A,ƒkp_en) (14)wherein GCN is the graph convolutional network, and λ is a predefined adjacency matrix of human key-points.
8. The method according to claim 7, wherein the specific implementation process of step (4) comprises steps of step 4-1 to step 4-4: step 4-1: performing global average pooling to the heat map of the key-points ƒpos to obtain the features of pedestrian key-points, and supplementing the features of pedestrian key-points to the local features ƒƒ_local;step 4-2: using the local feature map ƒlocal as a graph structure, wherein the graph comprises H*W nodes and each node is a C-dimensional feature, inputting the local feature map ƒlocal into two 1*1 convolutional networks, and then transposing the output of one of the convolutional networks to construct the relationship between nodes, as shown in equation (15): Ri,j=Conv(ƒlocal)TConv(ƒlocal) (15)wherein Ri,j is the matrix of relational feature, and Conv is a convolutional network;step 4-3: obtaining the features of spatial perception fs, of the corresponding relationship by using the matrix of relationship feature Ri,j, and then embedding the local feature map ƒlocal, the features of pedestrian key-points ƒpos and the features of spatial perception fs, into a link, as shown in equation (16) and equation (17): ƒsp=Conv(Ri,j) (16)ƒconcat=Concat[Conv(ƒlocal),Conv(ƒsp), Conv(ƒpos)] (17) wherein ƒsp are the features of spatial perception, Concat(·) is the channel link function, and ƒconcat are the connected feature vectors;inputting the ƒconcat into a 1*1 convolutional network and Sigmod to obtain a spatial attention map fatten, and finally a final pedestrian feature map ƒatt_local is obtained by multiplying the spatial attention map fatten with the local feature map ƒlocal;step 4-4: constructing multiple classification heads according to the pedestrian structure, and dividing the pedestrian feature map ƒatt_local into four local features, ƒ1, ƒ2, ƒ3, ƒ4, to classify the pedestrian images.
9. The method according to claim 8, wherein the specific implementation process of step (5) comprises steps of step 5-1 to step 5-4: step 5-1: using labeled data in the pedestrian re-identification dataset as supervision information, and using ID loss and difficult triplet loss to train the network for each training batch as shown in equation (18), wherein the ID loss uses cross entropy loss:

Priority Claims (1)

Number	Date	Country	Kind
202211593464.9	Dec 2022	CN	national

US Referenced Citations (4)

Number	Name	Date	Kind
10657364	El-Khamy	May 2020	B2
11699290	Wang	Jul 2023	B1
11835951	Djuric	Dec 2023	B2
20220066544	Kwon	Mar 2022	A1

Foreign Referenced Citations (8)

Number	Date	Country
113128461	Jul 2021	CN
113361334	Sep 2021	CN
114120363	Mar 2022	CN
115050048	Sep 2022	CN
115311619	Nov 2022	CN
115497122	Dec 2022	CN
2022174707	Nov 2022	JP
2022236668	Nov 2022	WO

Non-Patent Literature Citations (12)

Entry
Ma et al. “Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification”, MM '21, Oct. 20-24, 2021 (Year: 2021).
Wang et al. “Key point-aware occlusion suppression and semantic alignment for occluded person re-identification”, Information Sciences 606 (2022) 669-687 (Year: 2022).
Liu et al. “Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer”, The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) (Year: 2022).
Xu et al. “Learning Feature Recovery Transformer for Occluded Person Re-Identification”, IEEE Transactions on Image Processing, vol. 31, 2022 (Year: 2022).
Li et al. “Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer”, CVPR 2021 (Year: 2021).
Zhou et al. “Motion-Aware Transformer For Occluded Person Re-identification”, 2202.04243v2 [cs.CV] Feb. 10, 2022 (Year: 2022).
Zheng et al. “Person Re-identification: Past, Present and Future”, Journal of Latex Class Files, vol. 14, No. 8, Aug. 2015 (Year: 2015).
Zhao et al. “Short range correlation transformer for occluded person reidentification”, Neural Computing and Applications (2022) 34: 17633-17645 (Year: 2022).
Liu et al. “Survey for person re-identification based on coarse-to-fine feature learning”, Multimedia Tools and Applications (2022) ( Year: 2022).
Jiao Long, “Design and Implementation of Pedestrian Re-identification for Security Monitoring”, China Master's Theses Full-text Database Information Technology Series, Sep. 30, 2021.
Han Zhiwei,, “Person Re-Identification in the Wild”, China Master's Theses Full-text Database Information Technology Series, Jul. 31, 2022.
Shuren Zhou et al., “Occluded person re-identification based on embedded graph matching network for contrastive feature relation”, Theoretical Advances, Nov. 18, 2022.

Occluded pedestrian re-identification method based on pose estimation and background suppression

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

CPC

Field of Search

US