基于概率轨迹匹配的机器人模仿学习方法
DOI:
CSTR:
作者:
作者单位:

北京工业大学 电子信息与控制工程学院,北京工业大学 电子信息与控制工程学院,北京工业大学 电子信息与控制工程学院

作者简介:

通讯作者:

中图分类号:

TP 181

基金项目:

国家自然科学(61375086);国家自然科学(61075110);高等学校博士学科点专项科研基金资助课题(No.20101103110007).


Robot Imitation Learning Method Based on Trajectory Probability Matching
Author:
Affiliation:

College of Electronic and Control Engineering,Beijing University of Technology,College of Electronic and Control Engineering,Beijing University of Technology,College of Electronic and Control Engineering,Beijing University of Technology

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    模仿学习是机器人仿生机制研究的主要内容之一,即通过观察、理解、学习、模仿示教行为实现机器人的仿生特性。基于高斯过程分别表达采集离散示教信号所构成的示教轨迹和含有未知参数策略的模仿轨迹,构建模仿学习方法框架,将概率模型匹配引入到模仿学习中,以KL散度为代价函数比较两种轨迹的概率分布,运用梯度下降法寻求使KL散度最小的最优模仿控制策略,将策略应用于模仿机器人以完成与示教相同的模仿任务。以关节型机器人的机械臂摆动行为模仿为学习任务进行仿真,结果表明基于概率轨迹匹配的模仿学习方法能够实现机械臂摆动行为模仿,学习过程较传统方法简易且学习效果较好。

    Abstract:

    Imitation learning is an important means of bio-robot to quickly learn new skills and methods, that is, through observation, understanding, learning, imitating the teaching behavior to achieve bionic robot. In view of some defects existing in the traditional methods, a new method is proposed to introduce the probabilistic matching model into imitation learning, that gaussian process were shown to express teach trajectory which was composed by discrete teach signal, and imitation trajectory with unknown parameters. Then compare the probability distribution of the two trajectories, seek the optimal control strategy----the policy, by minimizing the KL divergence to make use of gradient descent, finally applied the policy to the imitative robot for completing the teaching task. The essential part of the joint typeSrobot, mechanical arm,is used to be the imitate model, the simulation results of imitating the swing behavior demonstrate the effectiveness of the imitation learning method based on trajectory probability matching. The learning process is more simple and learning effect is better than the traditional methods.

    参考文献
    相似文献
    引证文献
引用本文

刘涛,于建均,阮晓钢.基于概率轨迹匹配的机器人模仿学习方法计算机测量与控制[J].,2015,23(11):6.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-04-17
  • 最后修改日期:2015-05-22
  • 录用日期:2015-05-22
  • 在线发布日期: 2015-11-18
  • 出版日期:
文章二维码