Adaptive Multi-Scale Fusion for Infrared and Visible Object Detection in YOLOv8

Authors

  • Zheng Ren College of Computing, Georgia Institute of Technology, North Avenue, Atlanta, GA 30332

DOI:

https://doi.org/10.53469/jtpes.2024.04(09).04

Keywords:

Object detection, YOLOv8, Multi-scale Feature Fusion, Adaptive Modality Weighting, Infrared images

Abstract

Object detection in infrared images presents unique challenges due to varying environmental conditions and the inherent characteristics of thermal data. This paper introduces a novel Multi-scale Feature Fusion and Adaptive Modality Weighting (MFAW) module integrated into the YOLOv8 architecture to enhance object detection performance in infrared imagery. By leveraging the strengths of both infrared and visible light data, the proposed method effectively addresses issues related to feature extraction and fusion. Comprehensive experiments conducted on the LLVIP and VEDAI datasets demonstrate that our approach significantly outperforms existing models in terms of mean Average Precision (mAP), achieving superior accuracy across multiple detection scenarios. The results indicate the effectiveness of the MFAW module in improving the adaptability and robustness of object detection systems, particularly in low-light conditions.

References

Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440-1448. https://doi.org/10.1109/ICCV.2015.169

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580-587. https://doi.org/10.1109/CVPR.2014.81

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), 2961-2969. https://doi.org/10.1109/ICCV.2017.322

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Only Once: Unified Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.org/10.1109/CVPR.2016.91

Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.

Liu, W., et al. (2020). YOLOv5: The Next Generation of Real-Time Object Detection. arXiv preprint arXiv:2207.02696.

Li, T., Wang, W., & Zhang, H. (2021). A novel object detection framework based on YOLOv4 and feature fusion. Journal of Ambient Intelligence and Humanized Computing, 12, 4001-4011. https://doi.org/10.1007/s12652-020-02704-7

Chen, H., et al. (2020). Infrared and visible image fusion for object detection in low-light environments. Optics Express, 28(8), 11577-11590. https://doi.org/10.1364/OE.392228

Zhang, Y., et al. (2020). Fusing infrared and visible images for improved object detection. Optics and Lasers in Engineering, 129, 106021. https://doi.org/10.1016/j.optlaseng.2019.106021

Wang, X., et al. (2021). Attention-based fusion for object detection in multi-modal scenarios. IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4554-4565. https://doi.org/10.1109/TNNLS.2020.2981353

Liu, X., et al. (2021). A survey of deep learning for multi-modal image fusion. Journal of Visual Communication and Image Representation, 76, 103013. https://doi.org/10.1016/j.jvci.2021.103013

Liu, M., et al. (2021). Multi-modal object detection using a hybrid approach. Pattern Recognition, 110, 107614. https://doi.org/10.1016/j.patcog.2020.107614

Yang, R., et al. (2021). Multi-modal object detection with attention mechanisms. IEEE Transactions on Multimedia, 23, 1537-1549. https://doi.org/10.1109/TMM.2021.3050694

Lee, K. H., et al. (2019). Real-time multi-modal object detection for autonomous driving. Sensors, 19(14), 3092. https://doi.org/10.3390/s19143092

Wang, P., et al. (2021). Deep learning techniques for object detection: A review. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2021.07.004

Kim, D. H., et al. (2020). Infrared and visible image fusion using multi-scale feature extraction. IEEE Transactions on Image Processing, 29, 3278-3291. https://doi.org/10.1109/TIP.2020.2973152

Zhao, M., et al. (2019). Integrating depth information for real-time object detection. 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5053-5061. https://doi.org/10.1109/CVPR.2019.00517

Yang, J., et al. (2018). Object detection in infrared images using YOLO. International Journal of Image Processing, 12(1), 1-10.

Zhang, S., et al. (2020). A comprehensive review of multi-modal object detection. Computers & Graphics, 88, 46-62. https://doi.org/10.1016/j.cag.2020.06.003

Wu, L., et al. (2019). Object detection in complex backgrounds using multi-sensor fusion. Pattern Recognition Letters, 125, 360-367. https://doi.org/10.1016/j.patrec.2019.05.021

Hu, D. Q., et al. (2021). Multi-modal detection system using YOLO for real-time applications. Sensors, 21, 1371. https://doi.org/10.3390/s21041371

Gupta, R., et al. (2022). Deep learning techniques for object detection: A review. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2022.03.006

Huang, C., et al. (2019). Feature-level fusion for multi-modal object detection. Computer Vision and Image Understanding, 184, 102832. https://doi.org/10.1016/j.cviu.2019.102832

Pham, T. A., et al. (2019). Integrating deep learning and multi-modal data for object detection. Multimedia Tools and Applications, 78, 22443-22461. https://doi.org/10.1007/s11042-018-7185-0

Asad, K. E., et al. (2020). Review of multi-modal data fusion techniques for object detection. Information Fusion, 60, 134-146. https://doi.org/10.1016/j.inffus.2020.02.014

Iquebal, A. S., Wu, P., Sarfraz, A., & Ankit, K. (2023). Emulating the evolution of phase separating microstructures using low-dimensional tensor decomposition and nonlinear regression. MRS Bulletin, 48(6), 602-613.

Z. Ren, "A Novel Feature Fusion-Based and Complex Contextual Model for Smoking Detection," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 1181-1185, doi: 10.1109/CISCE62493.2024.10653351.

Shen, Z. (2023). Algorithm Optimization and Performance Improvement of Data Visualization Analysis Platform based on Artificial Intelligence. Frontiers in Computing and Intelligent Systems, 5(3), 14-17.

Wu, Z. (2024). Deep Learning with Improved Metaheuristic Optimization for Traffic Flow Prediction. Journal of Computer Science and Technology Studies, 6(4), 47-53.

Downloads

Published

2024-10-08

How to Cite

Ren, Z. (2024). Adaptive Multi-Scale Fusion for Infrared and Visible Object Detection in YOLOv8. Journal of Theory and Practice of Engineering Science, 4(09), 28–34. https://doi.org/10.53469/jtpes.2024.04(09).04