Abstract:The existing planning methods mostly are manual planning, which have problems such as inefficiency, high cost, and easy misoperation. Thus, the characteristics of microassembly operation tasks, collaboration and competition relationship of micro-assembly operation are analyzed in detail, and a method for the construction of action, state and reward conditions that conforms to the characteristics of micro-assembly tasks in multi-agent reinforcement learning is proposed. Using CoppeliaSim simulation software to model existing equipment physically, a learning model is built and trained based on multi-agent deep deterministic policy gradient algorithm, then the designed action, state and reward function are verified experimentally in simulation environment. Ultimately a stable path and complete task implementation scheme is obtained. The simulation results show that the proposed method is more suitable for the micro-assembly task with Cartesian coordinate motion as the main framework, and can overcome the shortcomings of existing planning methods. Besides, the method can realize the multi manipulator arm collaborative operation, which can be practically engineered and improve the efficiency of the task and the automation degree of planning.