Occlusion-Guided Feature Purification Learning via Reinforced Knowledge Distillation for Occluded Person Re-Identification
DOI:
https://doi.org/10.64509/jicn.12.31Keywords:
Occluded person re-identification; holistic person re-identification; feature purification; reinforcement learning; knowledge distillationAbstract
Occluded person re-identification aims to retrieve holistic images of a given identity based on occluded person images. Most existing approaches primarily focus on aligning visible body parts using prior information, applying occlusion augmentation to predefined regions, or complementing the missing semantics of occluded body parts with the assistance of holistic images. Nevertheless, they struggle to generalize across diverse occlusion scenarios that are absent from the training data and often overlook the pervasive issue of feature contamination caused by holistic images. In this work, we propose a novel Occlusion-Guided Feature Purification Learning via Reinforced Knowledge Distillation (OGFR) to address these two issues simultaneously. OGFR adopts a teacher-student distillation architecture that effectively incorporates diverse occlusion patterns into feature representation while transferring the purified discriminative holistic knowledge from the holistic to the occluded branch through reinforced knowledge distillation. Specifically, an Occlusion-Aware Vision Transformer is designed to leverage learnable occlusion pattern embeddings to explicitly model such diverse occlusion types, thereby guiding occlusion-aware robust feature representation. Moreover, we devise a Feature Erasing and Purification Module within the holistic branch, in which an agent is employed to identify low-quality patch tokens of holistic images that contain noisy negative information via deep reinforcement learning, and substitute these patch tokens with learnable embedding tokens to avoid feature contamination and further excavate identity-related discriminative clues. Afterward, with the assistance of knowledge distillation, the student branch effectively absorbs the purified holistic knowledge to precisely learn robust representation regardless of the interference of occlusions. {Extensive experiments validate OGFR, on Occluded-Duke it achieves 76.6% Rank-1 and 64.7% mAP, outperforming the closest Transformer-based method by +3.3% Rank-1 and +2.4% mAP, with consistent gains on other benchmarks.
Downloads
References
[1] Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Be-yond part models: Person retrieval with refined partpooling (and a strong convolutional baseline). In Pro-ceedings of the European Conference on ComputerVision, pp. 480–496 (2018). https://doi.org/10.1007/978-3-030-01225-030
[2] He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.:Transreid: Transformer-based object re-identification.In Proceedings of the IEEE/CVF International Confer-ence on Computer Vision, pp. 15013–15022 (2021).https://doi.org/10.1109/ICCV48922.2021.01474
[3] Jin, X., Lan, C., Zeng, W., Chen, Z., Zhang, L.: Stylenormalization and restitution for generalizable per-son re-identification. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recog-nition, pp. 3140–3149 (2020). https://doi.org/10.1109/CVPR42600.2020.00321
[4] Yan, C., Pang, G., Jiao, J., Bai, X., Feng, X., Shen,C.: Occluded person re-identification with single-scale global representations. In Proceedings of theIEEE/CVF International Conference on Computer Vi-sion, pp. 11875–11884 (2021). https://doi.org/10.1109/ICCV48922.2021.01166
[5] Ren, X., Zhang, D., Bao, X., Zhang, Y.:s2-net: Se-mantic and saliency attention network for person re-identification. IEEE Transactions on Multimedia25,4387–4399 (2022) https://doi.org/10.1109/TMM.2022.3174768
[6] Zhang, X., Yan, Y., Xue, J.-H., Hua, Y., Wang, H.:Semantic-aware occlusion-robust network for occludedperson re-identification. IEEE Transactions on Circuitsand Systems for Video Technology31(7), 2764–2778(2020) https://doi.org/10.1109/TCSVT.2020.3033165
[7] Yang, J., Zhang, J., Yu, F., Jiang, X., Zhang, M., Sun,X., Chen, Y.-C., Zheng, W.-S.: Learning to know whereto see: A visibility-aware approach for occluded personre-identification. In Proceedings of the IEEE/CVF In-ternational Conference on Computer Vision, pp. 11885–11894 (2021)
[8] Huang, H., Chen, X., Huang, K.: Human parsing basedalignment with multi-task learning for occluded personre-identification. In IEEE International Conference onMultimedia and Expo, pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102789
[9] He, L., Liu, W.: Guided saliency feature learning forperson re-identification in crowded scenes. In Proceed-ings of the European Conference on Computer Vision,pp. 357–373 (2020)
[10] Gao, S., Wang, J., Lu, H., Liu, Z.: Pose-guided visiblepart matching for occluded person reid. In Proceedingsof the IEEE/CVF Conference on Computer Vision andPattern Recognition, pp. 11744–11752 (2020). https://doi.org/10.1109/ICCV.2019.00063
[11] Miao, J., Wu, Y., Yang, Y.: Identifying visible parts viapose estimation for occluded person re-identification.IEEE Transactions on Neural Networks and LearningSystems33(9), 4624–4634 (2021) https://doi.org/10.1109/TNNLS.2021.3059515
[12] Chen, P., Liu, W., Dai, P., Liu, J., Ye, Q., Xu, M., Chen,Q., Ji, R.: Occlude them all: Occlusion-aware attentionnetwork for occluded person re-id. In Proceedings ofthe IEEE/CVF International Conference on ComputerVision, pp. 11833–11842 (2021)
[13] Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.:Feature erasing and diffusion network for occluded per-son re-identification. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recogni-tion, pp. 4754–4763 (2022)
[14] Zhao, Y., Zhu, S., Liang, Z.: Patch features re-construction transformer for occluded person re-identification. In 2022 41st Chinese Control Confer-ence, pp. 6273–6278 (2022). https://doi.org/10.23919/CCC55666.2022.9901670
[15] Xu, B., He, L., Liang, J., Sun, Z.: Learning fea-ture recovery transformer for occluded person re-identification. IEEE Transactions on Image Process-ing31, 4651–4662 (2022) https://doi.org/10.1109/TIP.2022.3186759
[16] Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guidedfeature alignment for occluded person re-identification.In Proceedings of the IEEE/CVF International Confer-ence on Computer Vision, pp. 542–551 (2019)
[17] Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded personre-identification. In IEEE International Conference onMultimedia and Expo, pp. 1–6 (2018)
[18] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian,Q.: Scalable person re-identification: A benchmark. InProceedings of the IEEE/CVF International Conferenceon Computer Vision, pp. 1116–1124 (2015)
[19] Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples gen-erated by gan improve the person re-identification base-line in vitro. In Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision, pp. 3754–3762(2017)
[20] Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfergan to bridge domain gap for person re-identification. InProceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, pp. 79–88 (2018)
[21] Zheng, W.-S., Gong, S., Xiang, T.: Person re-identification by probabilistic relative distance compar-ison. In Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition, pp. 649–656(2011). https://doi.org/10.1109/CVPR.2011.5995598
[22] Avraham,T.,Gurvich,I.,Lindenbaum,M.,Markovitch, S.: Learning implicit transfer for per-son re-identification. In Proceedings of the EuropeanConference on Computer Vision, pp. 381–390 (2012).https://doi.org/10.1007/978-3-642-33863-238
[23] Hirzer, M., Roth, P.M., K ̈ostinger, M., Bischof,H.: Relaxed pairwise learned metric for person re-identification. In Proceedings of the European Confer-ence on Computer Vision, pp. 780–793 (2012)
[24] Chen, W., Chen, X., Zhang, J., Huang, K.: Beyondtriplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, pp.403–412 (2017). https://doi.org/10.1109/CVPR.2017.145
[25] Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learn-ing discriminative features with multiple granularitiesfor person re-identification. In Proceedings of the ACMInternational Conference on Multimedia, pp. 274–282(2018)
[26] Zheng, F., Deng, C., Sun, X., Jiang, X., Guo, X., Yu,Z., Huang, F., Ji, R.: Pyramidal person re-identificationvia multi-loss dynamic training. In Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition, pp. 8514–8522 (2019)
[27] Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.:Identity-guided human semantic parsing for person re-identification. In Proceedings of the European Confer-ence on Computer Vision, pp. 346–363 (2020)
[28] Hou, R., Ma, B., Chang, H., Gu, X., Shan, S.,Chen, X.: Feature completion for occluded person re-identification. IEEE Transactions on Pattern Analysisand Machine Intelligence44(9), 4894–4912 (2021)https://doi.org/10.1109/TPAMI.2021.3079910
[29] Hinton, G., Vinyals, O., Dean, J.: Distilling theknowledge in a neural network. arXiv preprintarXiv:1503.02531 (2015)
[30] Zheng, K., Lan, C., Zeng, W., Liu, J., Zhang, Z.,Zha, Z.-J.: Pose-guided feature learning with knowl-edge distillation for occluded person re-identification.In Proceedings of the ACM International Conference onMultimedia, pp. 4537–4545 (2021)
[31] Hu, Z., Hou, W., Liu, X.: Deep batch active learningand knowledge distillation for person re-identification.IEEE Sensors Journal22(14), 14347–14355 (2022)https://doi.org/10.1109/JSEN.2022.3181238
[32] Ni, H., Li, Y., Gao, L., Shen, H.T., Song, J.: Part-awaretransformer for generalizable person re-identification.In Proceedings of the IEEE/CVF International Confer-ence on Computer Vision, pp. 11280–11289 (2023).https://doi.org/10.1109/ICCV51070.2023.01036
[33] Lan, L., Teng, X., Zhang, J., Zhang, X., Tao, D.:Learning to purification for unsupervised person re-identification. IEEE Transactions on Image Process-ing32, 3338–3353 (2023) https://doi.org/10.1109/TIP.2023.3278860
[34] Jia, M., Sun, Y., Zhai, Y., Cheng, X., Yang, Y., Li,Y.: Semi-attention partition for occluded person re-identification. In Proceedings of the AAAI Conferenceon Artificial Intelligence, pp. 998–1006 (2023)
[35] He, K., Gkioxari, G., Doll ́ar, P., Girshick, R.: Maskr-cnn. In Proceedings of the IEEE/CVF InternationalConference on Computer Vision, pp. 2961–2969 (2017)
[36] Kreiss, S., Bertoni, L., Alahi, A.: Pifpaf: Compositefields for human pose estimation. In Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition, pp. 11977–11986 (2019)
[37] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weis-senborn, D., Zhai, X., Unterthiner, T., Dehghani, M.,Minderer, M., Heigold, G., Gelly, S.,et al.: An imageis worth 16x16 words: Transformers for image recog-nition at scale. In International Conference on LearningRepresentations, pp. 1–218 (2021)
[38] Peters, J.: Policy gradient methods. Scholarpedia5(1),3698 (2011) https://doi.org/10.4249/scholarpedia.3698
[39] Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: Aunified embedding for face recognition and cluster-ing. In Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition, pp. 815–823(2015)
[40] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei,L.: Imagenet: A large-scale hierarchical image database.In Proceedings of the IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, pp. 248–255(2009)
[41] Jia, M., Cheng, X., Lu, S., Zhang, J.: Learning dis-entangled representation implicitly via transformer for occluded person re-identification. IEEE Transactions onMultimedia25, 1294–1305 (2022) https://doi.org/10.1109/TMM.2022.3141267
[42] Wang, S., Liu, R., Li, H., Qi, G., Yu, Z.: Occluded per-son re-identification via defending against attacks fromobstacles. IEEE Transactions on Information Forensicsand Security18, 147–161 (2022)
[43] Tan, H., Liu, X., Yin, B., Li, X.: Mhsa-net: Multi-head self-attention network for occluded person re-identification. IEEE Transactions on Neural Networksand Learning Systems32, 1568–1582 (2023) https://doi.org/10.1109/TNNLS.2022.3144163
[44] Ma, Z., Zhao, Y., Li, J.: Pose-guided inter-and intra-part relational transformer for occluded person re-identification. In Proceedings of the ACM InternationalConference on Multimedia, pp. 1487–1496 (2021).https://doi.org/10.1145/3474085.3475283
[45] Zhao, C., Lv, X., Dou, S., Zhang, S., Wu, J., Wang,L.: Incremental generative occlusion adversarial sup-pression network for person reid. IEEE Transactions onImage Processing30, 4212–4224 (2021) https://doi.org/10.1109/TIP.2021.3070182
[46] Huang, M., Hou, C., Yang, Q., Wang, Z.: Reasoning andtuning: Graph attention network for occluded personre-identification. IEEE Transactions on Image Process-ing32, 1568–1582 (2023) https://doi.org/10.1109/TIP.2023.3247159
[47] Wang, P., Ding, C., Shao, Z., Hong, Z., Zhang, S.,Tao, D.: Quality-aware part models for occluded personre-identification. IEEE Transactions on Multimedia25,3154–3165 (2023) https://doi.org/10.1109/TMM.2022.3156282
[48] Jia, M., Cheng, X., Zhai, Y., Lu, S., Ma, S., Tian, Y.,Zhang, J.: Matching on sets: Conquer occluded per-son re-identification without alignment. In Proceedingsof the AAAI Conference on Artificial Intelligence, pp.1673–1681 (2021)
[49] Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Di-verse part discovery: Occluded person re-identificationwith part-aware transformer. In Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition, pp. 2898–2907 (2021). https://doi.org/10.1109/CVPR46437.2021.00292
[50] Mao, J., Yao, Y., Sun, Z., Huang, X., Shen, F., Shen,H.-T.: Attention map guided transformer pruning for oc-cluded person re-identification on edge device. IEEETransactions on Multimedia25, 1592–1599 (2023)https://doi.org/10.1109/TMM.2023.3265159
[51] Tan, L., Dai, P., Ji, R., Wu, Y.: Dynamic prototype maskfor occluded person re-identification. In Proceedings ofthe ACM International Conference on Multimedia, pp.531–540 (2022)
[52] Li, Y., Liu, Y., Zhang, H., Zhao, C., Wei, Z., Miao, D.:Occlusion-aware transformer with second-order atten-tion for person re-identification. IEEE Transactions onImage Processing33, 3200–3211 (2024) https://doi.org/10.1109/TIP.2024.3393360
[53] Zhang, G., Yang, Y., Zheng, Y., Martin, G., Wang,R.: Mask-aware hierarchical aggregation transformerfor occluded person re-identification. IEEE Transac-tions on Circuits and Systems for Video Technol-ogy35(6), 5821–5832 (2025) https://doi.org/10.1109/TCSVT.2025.3531142
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
