基于改进Mask R−CNN算法对堆叠汽车零件的实例分割方法

朱新龙; 崔国华; 陈赛旋; 杨琳

doi:10.12299/jsues.21-0309

基于改进Mask R−CNN算法对堆叠汽车零件的实例分割方法

doi: 10.12299/jsues.21-0309

上海工程技术大学机械与汽车工程学院, 上海 201620

基金项目: 上海市自然科学基金项目资助（18030501200）；江苏省重点研发计划项目资助（BE2020082-3）

详细信息

作者简介:
朱新龙（1995−），男，在读硕士，研究方向为机器人视觉. E-mail：1070485584@qq.com

中图分类号: TP242
计量
- 文章访问数: 1149
- HTML全文浏览量: 601
- PDF下载量: 169
- 被引次数: 0
出版历程
- 收稿日期: 2021-12-27
- 网络出版日期: 2022-11-16
- 刊出日期: 2022-06-30

Instance segmentation method based on improved Mask R−CNN for the stacked automobile parts

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

摘要

摘要:
针对堆叠汽车零件识别检测与分割速度慢、精度低及鲁棒性差等问题，提出一种基于改进Mask R−CNN算法对堆叠汽车零件快速检测与实例分割的方法. 首先，对Mask R−CNN中的特征提取网络进行优化，将ResNet + 特征金字塔网络(Feature Pyramid Networks，FPN)替换成MobileNets + FPN作为骨干网络，有效减少网络参数并压缩模型体积，提高模型检测的速度；然后，通过在Mask R−CNN的ROI Align结构后加入空间变换网络(Spatial Transformer Networks，STN)模块，保证模型的检测精度. 试验结果表明，改进后压缩了模型的尺寸，识别检测速度提升了1倍；模型的平均精度均值(Mean Average Precision，mAP)较改进前也有所提升. 对未经训练的新样本进行检测，结果表明该模型速度上优于Mask R−CNN，且更轻量和精准，能够快速准确地实现对堆叠汽车零件检测与分割，验证了改进模型的实际可行性.
- 实例分割 /
- 堆叠 /
- MobileNets模型 /
- 空间变换网络
Abstract:
Aiming at the problems of slow speed, low accuracy and poor robustness in recognition, detection and segmentation of stacked automobile parts, a fast detection and instance segmentation method based on improved Mask R−CNN algorithm was proposed. Firstly, the feature extraction network of Mask R-CNN was optimized, and ResNet + Feature Pyramid Networks (FPN) was replaced by MobileNets + FPN as the backbone network, which effectively reduced network parameters, compressed model volume and improved model detection speed. Then,Spatial Transformer Networks (STN) module was added after the ROI Align structure of Mask R-CNN to ensure the detection accuracy of the model. The experimental results show that the size of the model is compressed and the detection speed is doubled. The mean Average Precision (mAP) of the model is also improved. The detection of untrained new samples shows that the model is better than Mask R−CNN in speed, lighter and more accurate, and can quickly and accurately detect and segment stacked automobile parts, which verifies the practical feasibility of the improved model.
- instance segmentation /
- stacked /
- MobileNets model /
- spatial transformer networks (STN)

HTML全文

图 1 数据集中的样本

Figure 1. Parts in data set

下载: 全尺寸图片幻灯片

图 2 各种数据增强方法效果图

Figure 2. Renderings of various data enhancement methods

下载: 全尺寸图片幻灯片

图 3 标注示例图

Figure 3. Annotation example diagram

下载: 全尺寸图片幻灯片

图 4 深度可分离卷积

Figure 4. Depthwise separable convolution

下载: 全尺寸图片幻灯片

图 5 STN模块

Figure 5. STN module

下载: 全尺寸图片幻灯片

图 6 MobileNets和FPN特征融合图

Figure 6. Feature fusion map of MobileNets and FPN

下载: 全尺寸图片幻灯片

图 7 改进Mask R−CNN网络架构

Figure 7. Improved Mask R−CNN network architecture

下载: 全尺寸图片幻灯片

图 8 损失函数值

Figure 8. Loss function value

下载: 全尺寸图片幻灯片

图 9 被检测零件实例分割过程

Figure 9. Instance segmentation process of detected parts

下载: 全尺寸图片幻灯片

图 10 改进前后多目标零件检测精度

Figure 10. Comparison of AP values before and after improvement

下载: 全尺寸图片幻灯片

图 11 汽车零件识别分割效果图

Figure 11. Auto parts recognition segmentation effect diagrams

下载: 全尺寸图片幻灯片

图 12 改进前后识别准确率对比

Figure 12. Comparison of recognition accuracy before and after improvement

下载: 全尺寸图片幻灯片

表 1 训练集、验证集和测试集的组成

Table 1. Composition of training, validation and test set

种类	零件数量
种类	训练集	验证集	测试集
连杆	1981	788	1196
齿轮轴	906	354	538
螺栓	895	365	527
螺母	873	366	531

下载: 导出CSV

表 2 数据集格式

Table 2. Dataset format

数据集文件	相应功能
imgs	训练图像
label_viz	标注后分割图像
mask	标注后掩膜图像
yaml	标注后位置文件

下载: 导出CSV

表 3 修改后MobileNets的5个阶段

Table 3. Five stages of modified MobileNets

阶段	输入	卷积类型	滤波器尺寸/步长	输出
1	224 × 224 × 3	Conv	3 × 3 × 3 × 32/2	V1
	112 × 112 × 32	Convd	3 × 3 × 32/1
	112 × 112 × 32	Conv	1 × 1 × 32 × 64/1
2	112 × 112 × 64	Convd	3 × 3 × 64/2	V2
	56 × 56 × 64	Conv	1 × 1 × 64 × 128/1
	56 × 56 × 128	Convd	3 × 3 × 128/1
	56 × 56 × 128	Conv	1 × 1 × 128 × 128/1
3	56 × 56 × 128	Convd	3 × 3 × 128/2	V3
	28 × 28 × 128	Conv	1 × 1 × 128 × 256/1
	28 × 28 × 256	Convd	3 × 3 × 256/1
	28 × 28 × 256	Conv	1 × 1 × 256 × 256/1
4	28 × 28 × 256	Convd	3 × 3 × 256/2	V4
	14 × 14 × 256	Conv	1 × 1 × 256 × 512/1
	14 × 14 × 512	Convd × 5	3 × 3 × 512/1
	14 × 14 × 512	Conv × 5	1 × 1 × 512 × 512/1
5	14 × 14 × 512	Convd	3 × 3 × 512/2	V5
	7 × 7 × 512	Conv	1 × 1 × 512 × 1024/1
	7 × 7 × 1024	Convd	3 × 3 × 1024/1
	7 × 7 × 1024	Conv	1 × 1 × 1024 × 1024/1

下载: 导出CSV

表 4 改进前后模型大小及检测时间对比

Table 4. Comparison of model size and detection time before and after improvement

参数	Mask R−CNN	改进的Mask R−CNN
模型大小/MB	249.9	92.8
单张图片检测时间/s	3.8	1.7

下载: 导出CSV

表 5 改进前后识别准确率对比

Table 5. Comparison of recognition accuracy before and after improvement

模型	类别	测试样本数量	正确识别零件数量	误识别及未识别零件数量	识别准确率/%
改进前	单目标	120	115	5	95.8
	多目标散放	120	397	21	94.9
	多目标堆叠	120	451	45	90.9
改进后	单目标	120	116	4	96.7
	多目标散放	120	401	17	96.4
	多目标堆叠	120	474	22	95.6

下载: 导出CSV

参考文献(15)

[1]	刘学平, 李玙乾, 刘励, 等. 自适应边缘优化的改进YOLOV3目标识别算法[J] . 微电子学与计算机,2019,36(7):59 − 64.
[2]	余永维, 韩鑫, 杜柳青. 基于Inception-SSD算法的零件识别[J] . 光学精密工程,2020,28(8):1799 − 1809.
[3]	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J] . IEEE Transactions on Pattern AnalysisMachine Intelligence,2017(99):1.
[4]	魏中雨, 黄海松, 姚立国. 基于机器视觉和深度神经网络的零件装配检测[J] . 组合机床与自动化加工技术,2020(3):74 − 77,82.
[5]	YANG Z X, DONG R X, XU H, et al. Instance Segmentation Method Based on Improved Mask R-CNN for the Stacked Electronic Components[J] . Electronics,2020,9(6):1.
[6]	GUO D, KONG T, SUN F C, et al. Object discovery and grasp detection with a shared convolutional neural network[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm: IEEE, 2016: 2038−2043.
[7]	王德明, 颜熠, 周光亮, 等. 基于实例分割网络与迭代优化方法的3D视觉分拣系统[J] . 机器人,2019,41(5):637 − 648.
[8]	ZHANG H, LAN X, BAI S, et al. A multi-task convolutional neural network for autonomous robotic grasping in object stacking scenes[C]//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau: IEEE, 2019: 6435−6442.
[9]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J] . IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137 − 1149. doi: 10.1109/TPAMI.2016.2577031
[10]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 936−944.
[11]	XIE S, GIRSHICK R, DOLLAR P, et al. Aggregated Residual Transformations for Deep Neural Networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 5987−5995.
[12]	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL]. (2017−04−17)[2021−10−17]. https://arxiv.org/pdf/1704.04861.pdf.
[13]	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: IEEE, 2015: 2017–2025.
[14]	NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa: IMLS, 2010: 807–814.
[15]	LIN T-Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of Computer Vision – ECCV 2014. Zurich: Springer, 2014: 740−755.