Adaptive Multi-Scale Fusion for Infrared and Visible Object Detection in YOLOv8
DOI:
https://doi.org/10.53469/jtpes.2024.04(09).04Keywords:
Object detection, YOLOv8, Multi-scale Feature Fusion, Adaptive Modality Weighting, Infrared imagesAbstract
Object detection in infrared images presents unique challenges due to varying environmental conditions and the inherent characteristics of thermal data. This paper introduces a novel Multi-scale Feature Fusion and Adaptive Modality Weighting (MFAW) module integrated into the YOLOv8 architecture to enhance object detection performance in infrared imagery. By leveraging the strengths of both infrared and visible light data, the proposed method effectively addresses issues related to feature extraction and fusion. Comprehensive experiments conducted on the LLVIP and VEDAI datasets demonstrate that our approach significantly outperforms existing models in terms of mean Average Precision (mAP), achieving superior accuracy across multiple detection scenarios. The results indicate the effectiveness of the MFAW module in improving the adaptability and robustness of object detection systems, particularly in low-light conditions.
References
Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440-1448. https://doi.org/10.1109/ICCV.2015.169
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580-587. https://doi.org/10.1109/CVPR.2014.81
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), 2961-2969. https://doi.org/10.1109/ICCV.2017.322
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Only Once: Unified Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.org/10.1109/CVPR.2016.91
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.
Liu, W., et al. (2020). YOLOv5: The Next Generation of Real-Time Object Detection. arXiv preprint arXiv:2207.02696.
Li, T., Wang, W., & Zhang, H. (2021). A novel object detection framework based on YOLOv4 and feature fusion. Journal of Ambient Intelligence and Humanized Computing, 12, 4001-4011. https://doi.org/10.1007/s12652-020-02704-7
Chen, H., et al. (2020). Infrared and visible image fusion for object detection in low-light environments. Optics Express, 28(8), 11577-11590. https://doi.org/10.1364/OE.392228
Zhang, Y., et al. (2020). Fusing infrared and visible images for improved object detection. Optics and Lasers in Engineering, 129, 106021. https://doi.org/10.1016/j.optlaseng.2019.106021
Wang, X., et al. (2021). Attention-based fusion for object detection in multi-modal scenarios. IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4554-4565. https://doi.org/10.1109/TNNLS.2020.2981353
Liu, X., et al. (2021). A survey of deep learning for multi-modal image fusion. Journal of Visual Communication and Image Representation, 76, 103013. https://doi.org/10.1016/j.jvci.2021.103013
Liu, M., et al. (2021). Multi-modal object detection using a hybrid approach. Pattern Recognition, 110, 107614. https://doi.org/10.1016/j.patcog.2020.107614
Yang, R., et al. (2021). Multi-modal object detection with attention mechanisms. IEEE Transactions on Multimedia, 23, 1537-1549. https://doi.org/10.1109/TMM.2021.3050694
Lee, K. H., et al. (2019). Real-time multi-modal object detection for autonomous driving. Sensors, 19(14), 3092. https://doi.org/10.3390/s19143092
Wang, P., et al. (2021). Deep learning techniques for object detection: A review. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2021.07.004
Kim, D. H., et al. (2020). Infrared and visible image fusion using multi-scale feature extraction. IEEE Transactions on Image Processing, 29, 3278-3291. https://doi.org/10.1109/TIP.2020.2973152
Zhao, M., et al. (2019). Integrating depth information for real-time object detection. 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5053-5061. https://doi.org/10.1109/CVPR.2019.00517
Yang, J., et al. (2018). Object detection in infrared images using YOLO. International Journal of Image Processing, 12(1), 1-10.
Zhang, S., et al. (2020). A comprehensive review of multi-modal object detection. Computers & Graphics, 88, 46-62. https://doi.org/10.1016/j.cag.2020.06.003
Wu, L., et al. (2019). Object detection in complex backgrounds using multi-sensor fusion. Pattern Recognition Letters, 125, 360-367. https://doi.org/10.1016/j.patrec.2019.05.021
Hu, D. Q., et al. (2021). Multi-modal detection system using YOLO for real-time applications. Sensors, 21, 1371. https://doi.org/10.3390/s21041371
Gupta, R., et al. (2022). Deep learning techniques for object detection: A review. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2022.03.006
Huang, C., et al. (2019). Feature-level fusion for multi-modal object detection. Computer Vision and Image Understanding, 184, 102832. https://doi.org/10.1016/j.cviu.2019.102832
Pham, T. A., et al. (2019). Integrating deep learning and multi-modal data for object detection. Multimedia Tools and Applications, 78, 22443-22461. https://doi.org/10.1007/s11042-018-7185-0
Asad, K. E., et al. (2020). Review of multi-modal data fusion techniques for object detection. Information Fusion, 60, 134-146. https://doi.org/10.1016/j.inffus.2020.02.014
Iquebal, A. S., Wu, P., Sarfraz, A., & Ankit, K. (2023). Emulating the evolution of phase separating microstructures using low-dimensional tensor decomposition and nonlinear regression. MRS Bulletin, 48(6), 602-613.
Z. Ren, "A Novel Feature Fusion-Based and Complex Contextual Model for Smoking Detection," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 1181-1185, doi: 10.1109/CISCE62493.2024.10653351.
Shen, Z. (2023). Algorithm Optimization and Performance Improvement of Data Visualization Analysis Platform based on Artificial Intelligence. Frontiers in Computing and Intelligent Systems, 5(3), 14-17.
Wu, Z. (2024). Deep Learning with Improved Metaheuristic Optimization for Traffic Flow Prediction. Journal of Computer Science and Technology Studies, 6(4), 47-53.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Zheng Ren
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.