Industrial equipment identification based on multimodal adaptive fusion
-
摘要: 图文多模态机器学习结合图片和文本数据,可以提高工业设备识别的准确性。针对现有工业设备识别算法均基于图像数据构建,未能充分利用工业领域普遍存在的文本数据这一问题,提出一种基于图文多模态的工业设备识别模型。该模型包括一个全新的图文数据集和多模态识别算法。数据集图像基于真实场景采集,文本通过多模态大模型自动标注。识别算法能够自适应地调节模态权重,融合两种模态信息,帮助模型决策。实验结果表明,相较于图像单模态数据,所提出算法的识别准确率提高17.62%,相较于其他多模态识别算法,所提出算法的准确率提高5.3%,证明了算法的有效性。Abstract: Multimodal machine learning, which combines image and text data, is able to improve the accuracy of industrial equipment identification. To address the limitation that existing industrial equipment identification algorithms rely solely on image data and fail to leverage the widely available textual data in industrial settings, a multimodal industrial equipment identification model based on image and text was proposed. This model consists of a novel image-text dataset and a multimodal identification algorithm. The dataset images were collected from real scenarios, and the text was automatically annotated by a multimodal large model. The identification algorithm can adaptively adjust the weight of the modalities, integrate the information of the two modalities, and assist the model in decision-making. Experimental results show that the proposed algorithm can achieve an improvement of 17.62% in identification accuracy compared to single-modal image data, and outperforms other multimodal recognition algorithms by 5.3%, demonstrating its effectiveness.
-
表 1 数据集概况
Table 1. Dataset overview
类别名 样本数 标签 零部件 300 0 重型机械和设备 300 1 测试和测量设备 300 2 动力工具和手持设备 300 3 机床和加工设备 300 4 表 2 MAF模型的识别结果
Table 2. Recognition results of MAF model
分类器 融合方法 准确率/% 精确率/% 召回率/% F1分数/% 线性分类器 图片单模态 72.45 75.31 73.48 74.38 特征拼接 82.44 82.57 80.74 81.64 改进的双线性池化[15] 86.81 83.72 89.74 86.63 平均法 83.71 85.89 81.44 83.61 MAF 90.07 89.87 92.46 91.15 支持向量机 图片单模态 73.12 74.59 73.11 73.84 特征拼接 83.59 81.67 80.44 81.05 改进的双线性池化[15] 84.22 89.71 83.35 86.41 平均法 81.71 88.86 84.72 86.74 MAF 89.52 90.97 90.48 90.72 表 3 消融实验结果
Table 3. Results of ablation experiment
分类器 消融实验设置 准确率/% 精确率/% 召回率/% F1分数/% 线性分类器 去掉多模态注意力机制 86.31 87.38 84.24 85.78 去掉权重运算单元 85.88 86.61 85.23 85.93 MAF 90.07 89.87 92.46 91.15 支持向量机 去掉多模态注意力机制 84.99 86.14 84.87 85.50 去掉权重运算单元 84.36 85.91 84.11 85.00 MAF 89.52 90.97 90.48 90.72 -
[1] 陶俊鹏, 张玮东, 钟倩文, 等. 基于振动信号图像特征的降噪残差网络轴承故障诊断[J] . 噪声与振动控制, 2024, 44(3): 109 − 116,169. doi: 10.3969/j.issn.1006-1355.2024.03.017 [2] 周子杰, 展金, 李胜铭, 等. 复杂环境下的烧结机篦条故障实时检测方法研究[J/OL] . 机械科学与技术, 2024. DOI: 10.13433/j.cnki.1003-8728.20240076. [3] 吴大钰, 王岩松, 李燕, 等. 基于声信号的汽车发动机故障诊断方法综述[J] . 渤海大学学报(自然科学版), 2008, 29(3): 264 − 267. [4] 张阳, 刘瑾. 基于字符增强的工业设备故障命名实体识别[J] . 电子科技, 2024, 37(10): 48 − 54. [5] 刘传洋, 吴一全. 基于红外图像的电力设备识别及发热故障诊断方法研究进展[J] . 中国电机工程学报, 2025, 45(6): 2171 − 2195. [6] 徐哲壮, 黄平, 陈丹, 等. 融合机器视觉与邻近度估计的相似工业设备识别策略研究[J] . 仪器仪表学报, 2023, 44(1): 283 − 290. [7] 王雨滢, 赵庆生, 梁定康. 基于深度学习网络的电气设备图像分类[J] . 科学技术与工程, 2020, 20(23): 9491 − 9496. doi: 10.3969/j.issn.1671-1815.2020.23.035 [8] YAO N, CHENG K. Electric power equipment image recognition based on deep forest learning model with few samples[J] . Journal of Physics: Conference Series, 2021, 1732: 012025. doi: 10.1088/1742-6596/1732/1/012025 [9] WANG Y T, LIU H R, WANG D L, et al. Image processing in fault identification for power equipment based on improved super green algorithm[J] . Computers & Electrical Engineering, 2020, 87: 106753. [10] LI J N, LI D X, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C] //Proceedings of the 40th International Conference on Machine Learning. Honolulu: PMLR, 2023: 19730-19742. [11] LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C] //Proceedings of the 37th Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc. , 2023: 1516. [12] 肖进胜, 饶天宇, 贾茜, 等. 基于图切割的拉普拉斯金字塔图像融合算法[J] . 光电子·激光, 2014, 25(7): 1416 − 1424. [13] 周永福, 李文龙, 胡冉冉. 多尺度特征融合的双通道SSD行人头部检测算法[J] . 激光与光电子学进展, 2021, 58(24): 375 − 386. [14] YANG Z X, ZHU L C, WU Y, et al. Gated channel transformation for visual recognition[C] //Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11791 − 11800. [15] KUMAR G K, NANDAKUMAR K. Hate-CLIPper: multimodal hateful meme classification based on cross-modal interaction of CLIP features[C] //Proceedings of the 2nd Workshop on NLP for Positive Impact. Abu Dhabi: Association for Computational Linguistics, 2022: 171 − 183. -
基于多模态自适应融合的工业设备识别_附加材料.pdf
-
下载: