留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

离线强化学习动态避障导航算法

葛启兴 章伟 谢贵亮 胡陟

葛启兴, 章伟, 谢贵亮, 胡陟. 离线强化学习动态避障导航算法[J]. 上海工程技术大学学报, 2024, 38(3): 313-320. doi: 10.12299/jsues.23-0227
引用本文: 葛启兴, 章伟, 谢贵亮, 胡陟. 离线强化学习动态避障导航算法[J]. 上海工程技术大学学报, 2024, 38(3): 313-320. doi: 10.12299/jsues.23-0227
GE Qixing, ZHANG Wei, XIE Guiliang, HU Zhi. Offline reinforcement learning dynamic obstacles avoidance navigation algorithm[J]. Journal of Shanghai University of Engineering Science, 2024, 38(3): 313-320. doi: 10.12299/jsues.23-0227
Citation: GE Qixing, ZHANG Wei, XIE Guiliang, HU Zhi. Offline reinforcement learning dynamic obstacles avoidance navigation algorithm[J]. Journal of Shanghai University of Engineering Science, 2024, 38(3): 313-320. doi: 10.12299/jsues.23-0227

离线强化学习动态避障导航算法

doi: 10.12299/jsues.23-0227
详细信息
    作者简介:

    葛启兴(1988−),男,硕士生,研究方向为移动机器人和无人机路径规划算法。E-mail:1511094206@qq.com

    通讯作者:

    章 伟(1977− ),男,教授,博士,研究方向为集群智能、多智能体协同控制、编队控制、非线性系统稳健控制与状态观测、无人机避障控制。E-mail:wizzhang@foxmail.com

  • 中图分类号: V249.1;V279

Offline reinforcement learning dynamic obstacles avoidance navigation algorithm

  • 摘要: 需要实时采样更新数据供无人机(unmanned aerial vehicle, UAV)优化避障策略是深度强化学习(deep reinforcement learning, DRL)应用于防撞领域亟需解决的问题。针对此,提出一种基于离线DRL的动态避障导航算法。将离线DRL算法与速度障碍(velocity obstacle, VO)法结合,改善在线深度强化学习算法需要高实时性交互数据的问题。通过对策略更新进行约束,提升离线DRL算法的性能。开发一个基于VO的奖励函数,使无人机在躲避动态障物的同时考虑耗时和路径最短问题。在三维避障导航环境中仿真进一步验证该方法在路径长度、飞行耗时以及避障成功率等方面均优于在线深度强化学习避障算法,有效改善了DRL需要不断输入在线数据才能有效更新策略的问题。
  • 图  1  速度障碍法原理示意图

    Figure  1.  An illustration of velocity obstacle

    图  2  单动态障碍物环境

    Figure  2.  Single-dynamic obstacle environments

    图  3  单动态障碍物环境下UAV避障导航路径长度

    Figure  3.  Path length of UAV obstacle avoidance and navigation in single dynamic obstacle environments

    图  4  多动态障碍物环境

    Figure  4.  Multi-dynamic obstacle environments

    图  5  多动态障碍物环境中UAV避障导航路径长度

    Figure  5.  Path length of UAV obstacle avoidance and navigation in multi dynamic obstacle environment

    表  1  单动态障碍物环境下算法的避障导航指标模拟

    Table  1.   Simulations of obstacle avoidance and navigation indexes of algorithms in single dynamic obstacle environment

    环境 路径长度 飞行耗时 避障成功率
    TD3 PPO BCQ PBCQ TD3 PPO BCQ PBCQ TD3 PPO BCQ PBCQ
    1 66.1 14.9 15.2 14.7 42.0 15.0 11.5 11.3 0.98 0.99 0.85 0.95
    2 29.1 15.3 17.4 17.2 20.6 14.3 12.7 12.4 1 0.99 0.97 1
    3 20.5 15.3 16.1 15.3 21.0 12.7 12.4 12.2 0.87 0.99 0.99 0.99
    4 17.9 15.6 14.5 14.2 20.6 13.2 12.5 11.5 0.92 0.98 0.97 0.99
    下载: 导出CSV

    表  2  多动态障碍物环境下算法的避障导航指标模拟

    Table  2.   Simulations of obstacle avoidance and navigation indexes of algorithms in multi-dynamic obstacle environment

    环境 路径长度 飞行耗时 避障成功率
    TD3 PPO BCQ PBCQ TD3 PPO BCQ PBCQ TD3 PPO BCQ PBCQ
    1 28.3 17.2 21.0 19.8 23.1 18.8 20.2 17.9 0.97 0.98 0.77 0.96
    2 16.8 16.0 22.4 17.0 12.2 17.1 19.3 16.7 1 0.91 0.99 0.99
    3 18.5 15.6 17.6 15.0 15.4 15.5 17.2 15.4 0.93 0.87 0.85 0.94
    4 18.6 15.9 19.9 15.7 15.3 15.7 19.8 16.3 0.93 0.87 0.99 0.99
    下载: 导出CSV
  • [1] NEX F, REMONDINO F. UAV for 3D mapping applications: a review[J] . Applied Geomatics,2014,6(1):1 − 15. doi: 10.1007/s12518-013-0120-x
    [2] RADOGLOU-GRAMMATIKIS P, SARIGIANNIDISP, LAGKAS T, et al. A compilation of UAV applications for precision agriculture[J] . Computer Networks,2020,172:107148. doi: 10.1016/j.comnet.2020.107148
    [3] ALZAHRANI B, OUBBATI O S, BARNAWI A, et al. UAV assistance paradigm: State-of-the-art in applications and challenges[J] . Journal of Network and Computer Applications,2020,166:102706. doi: 10.1016/j.jnca.2020.102706
    [4] 多南讯, 吕强, 林辉灿, 等. 迈进高维连续空间: 深度强化学习在机器人领域中的应用[J] . 机器人,2019,41(2):276 − 288.
    [5] 王怿, 祝小平, 周洲, 等. 3维动态环境下的无人机路径跟踪算法[J] . 机器人,2014,36(1):83 − 91.
    [6] 陈海, 何开锋, 钱炜祺. 多无人机协同覆盖路径规划[J] . 航空学报,2016,37(3):928 − 935.
    [7] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J] . 航空学报,2020,41(S1):4 − 14.
    [8] 赵晓, 王铮, 黄程侃, 等. 基于改进A算法的移动机器人路径规划[J] . 机器人,2018,40(6):903 − 910.
    [9] 徐飞. 基于改进人工势场法的机器人避障及路径规划研究[J] . 计算机科学,2016,43(12):293 − 296. doi: 10.11896/j.issn.1002-137X.2016.12.054
    [10] GAMMELL J D, SRINIVASA S S, BARFOOT T D. Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic[C] //Proceedings of 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macao: IEEE, 2014: 2997 − 3004.
    [11] YANG H, QI J, MIAO Y, et al. A new robot navigation algorithm based on a double-layer ant algorithm and trajectory optimization[J] . IEEE Transactions on Industrial Electronics,2018,66(11):8557 − 8566.
    [12] FOX D, BURGARD W, THRUN S. The dynamic window approach to collision avoidance[J] . IEEE Robotics & Automation Magazine,1997,4(1):23 − 33.
    [13] CHEN Y F, LIU M, EVERETT M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C] //Proceedings of 2017 IEEE international conference on robotics and automation (ICRA). Singapore: IEEE, 2017: 285 − 292
    [14] EVERETT M, CHEN Y F, HOW J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning[C] //Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 2018: 3052 − 3059.
    [15] FIORINI P, SHILLER Z. Motion planning in dynamic environments using velocity obstacles[J] . The International Journal of Robotics Research,1998,17(7):760 − 772. doi: 10.1177/027836499801700706
    [16] VAN DEN BERG J, LIN M, MANOCHA D. Reciprocal velocity obstacles for real-time multi-agent navigation[C] //Proceedings of 2008 IEEE international conference on robotics and automation. Pasadena: IEEE, 2008: 1928 − 1935.
    [17] ALONSO-MORA J, BREITENMOSER A, RUFLI M, et al. Optimal reciprocal collision avoidance for multiple non-holonomic robots[M] . Berlin: Springer, 2013: 203 − 216.
    [18] HAN R, CHEN S, WANG S, et al. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards[J] . IEEE Robotics and Automation Letters,2022,7(3):5896 − 5903. doi: 10.1109/LRA.2022.3161699
    [19] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M] . London: MIT press, 2018.
    [20] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL] . (2017−08−28)[2023−03−13] . https://arxiv.org/pdf/1707.06347.
    [21] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[EB/OL] . (2018−10−22)[2023−05−11] . https://arxiv.org/pdf/1802.09477.
    [22] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[EB/OL] . (2019−01−29)[2023−07−03] . https://www.researchgate.net/publication/329525481.
    [23] KUMAR A, FU J, SOH M, et al. Stabilizing off-policy Q-learning via bootstrapping error reduction[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2019.
    [24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J] . Nature,2015,518:529 − 533. doi: 10.1038/nature14236
  • 加载中
图(5) / 表(2)
计量
  • 文章访问数:  76
  • HTML全文浏览量:  35
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-11-10
  • 网络出版日期:  2024-11-14
  • 刊出日期:  2024-09-30

目录

    /

    返回文章
    返回