Body Keypoint Detection Algorithm Based on Channel Attention Mechanism

Shaojun Yu; Wenhao Huo; Yuping Lu; Hanqing Zhao; Yilin Wang; Lili Wang; Rizwan Anjum Muhammad

doi:10.64509/jicn.21.98

Authors

Shaojun Yu School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China Author
Wenhao Huo School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China Author
Yuping Lu School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China Author
Hanqing Zhao School of Physical Education, Beijing Jiaotong University, Beijing 100044, China Author
Yilin Wang School of Physical Education, Beijing Jiaotong University, Beijing 100044, China Author
Lili Wang School of Physical Education, Beijing Jiaotong University, Beijing 100044, China Author
Rizwan Anjum Muhammad Department of Electronic Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan Author

DOI:

https://doi.org/10.64509/jicn.21.98

Keywords:

Body keypoint Detection, Channel Attention Mechanism, BlazePose, Motion Analysis, Physical Fitness Assessment

Abstract

With the implementation of national strategies aimed at building a leading sporting nation and promoting nationwide fitness, physical fitness assessment has gained increasing attention as a crucial metric for evaluating students' physical condition and motor abilities. Concurrently, advancements in computer vision have enabled body keypoint detection technology to gradually replace traditional manual measurement methods, demonstrating significant potential for application in automated assessment systems. Accurate recognition of keypoints serves as the fundamental support for intelligent physical fitness testing and smart sports. However, existing keypoint detection algorithms often suffer from drifting of extremity keypoints, such as those of the hands and feet keypoints, in physical fitness test scenarios, thereby compromising the accuracy of the assessment. To address this challenge, this paper proposes Channel Attention BlazePose(CA-BlazePose), a body keypoint detection algorithm based on a channel attention mechanism, specifically designed for count-based physical fitness test scenarios, namely sit-ups and pull-ups. To tackle the issue of keypoint drift in motion detection, CA-BlazePose aims to enhance keypoint detection accuracy. It employs a two-stage network architecture consisting of heatmap training and regression fine-tuning, incorporating a channel attention module. This module strengthens the feature extraction process for extremity keypoints such as hands and feet, thereby improving recognition accuracy during detection.Experimental results demonstrate that, compared to mainstream keypoint detection algorithms such as OpenPose and BlazePose, the proposed CA-BlazePose algorithm achieves improvements in the PCK on two representative motion datasets, Common Objects in Context(COCO) and Leeds Sports Pose Extended(LSPET). Specifically, it shows an approximate increase of 7% for hand and foot keypoints and 8% for overall keypoints. Furthermore, in real-time detection tests for sit-ups and pull-ups captured from various viewing angles, CA-BlazePose demonstrates superior performance in handling frames with missing or drifting keypoints compared to existing algorithms, exhibiting more stable recognition performance under identical detection conditions.

Downloads

Download data is not yet available.

References

[1] Voeikov, R., Falaleev, N., Baikulov, R.: TTNet: Real-time temporal and spatial video analysis of table tennis. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3866-3874 (2020). https://doi.org/10.1109/CVPRW50498.2020.00450

[2] Ma, Y., Li, H., Yan, H.: Efficient real-time sports action pose estimation via EfficientPose and temporal graph convolution. IEEE Access 13, 39901-39911 (2025). https://doi.org/10.1109/ACCESS.2025.3542240

[3] Colyer, S.L., Evans, M., Cosker, D.P., Salo, A.T.: A Review of the Evolution of Vision-Based Motion Analysis and the Integration of Advanced Computer Vision Methods Towards Developing a Markerless System. Sports Medicine - Open 4(1), 24 (2018). https://doi.org/10.1186/s40798-018-0139-y

[4] Shi, Z., Zhao, H., Chen, J., Cheng, G.: Research on sit-up counting method and system based on human skeleton key point detection. Quality in Sport 24, 55408 (2024). https://doi.org/10.12775/QS.2024.24.55408

[5] Song, Z., Chen, Z.: Sports action detection and counting algorithm based on pose estimation and its application in physical education teaching. Informatica 48(10), 35-50 (2024). https://doi.org/10.31449/inf.v48i10.5918

[6] Guo, T., Yin, Q., Liu, X., Sun, Y., Qin, Z., Yu Han, Y., Lu, G.: Fitness exercise evaluation system based on improved DTW algorithm. Scientific Reports 15(1), 19961 (2025). https://doi.org/10.1038/s41598-025-02535-5

[7] Lu, J., Yang, T., Zhao, B., Wang, H., Luo, M., Zhou, Y., Li, Z.: Review of deep learning-based human pose estimation methods. Laser Optoelectronics Progress 58(24), 2400005 (2021). https://doi.org/10.3788/LOP202158.2400005

[8] Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 271-278 (2005). https://doi.org/10.1109/CVPR.2005.335

[9] Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1385-1392 (2011). https://doi.org/10.1109/CVPR.2011.5995741

[10] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R.: Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1297-1304 (2011). https://doi.org/10.1109/CVPR.2011.5995316

[11] Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation. In European Conference on Computer Vision, pp. 483-499 (2016). https://doi.org/10.1007/978-3-319-46484-8_29

[12] Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172-186 (2019). https://doi.org/10.1109/TPAMI.2019.2929257

[13] Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral Human Pose Regression. In Proceedings of the European Conference on Computer Vision (ECCV 2018), pp. 536-553 (2018). https://doi.org/10.1007/978-3-030-01231-1_33

[14] Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2353-2362 (2017). https://doi.org/10.1109/ICCV.2017.256

[15] Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high resolution representation learning for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5686-5696 (2019). https://doi.org/10.1109/CVPR.2019.00584

[16] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7103-7112 (2018). https://doi.org/10.1109/CVPR.2018.00742

[17] Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2938-2946 (2015). https://doi.org/10.1109/ICCV.2015.336

[18] Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 472-487 (2018). https://doi.org/10.1007/978-3-030-01231-1.29

[19] Bazarevsky, V., Grishchenko, I., Raveendran, K., Grundmann, M., Zhang, F., Zhu, T.: BlazePose: On-device real-time body pose tracking. In CVPR 2020 Workshop on Computer Vision for Augmented and Virtual Reality, pp. 1-4 (2020).

[20] Hulleck, A.A., Alshehhi, A., El Rich, M., Khan, R., Katmah, R., Mohseni, M.: BlazePose-Seq2Seq: Leveraging regular RGB cameras for robust gait assessment. IEEE Transactions on Neural Systems and Rehabilitation Engineering 32, 1715-1724 (2024). https://doi.org/10.1109/TNSRE.2024.3391908

[21] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV 2014), pp. 740-755 (2014). https://doi.org/10.1007/978-3-319-10602-1.48

[22] Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference, pp. 12.1-12.11 (2010). https://doi.org/10.5244/C.24.12