Taking the resource allocation of UAV networks as the research object, a dynamic time slot allocation scheme in multi-UAV networks based on reinforcement learning is investigated. In UAV networks, it is important to reasonably allocate time slot resources to improve UAV resource utilization. Aiming at the dynamic time slot allocation problem, the time slot allocation model of multi-UAV network is established according to the constraints of the scheduling problem. A time slot allocation scheme based on the proximal policy optimization (PPO) reinforcement learning algorithm is proposed. The environment mapping of the reinforcement learning algorithm is also carried out. Build a Markov decision process (MDP) model to match the reinforcement learning algorithm interface. Model training is performed in the gym simulation environment to validate the proposed time slot allocation scheme. The simulation results verify that the time slot allocation scheme based on the proximal policy optimization reinforcement learning algorithm can efficiently perform time slot allocation and improve the network channel utilization in a multi-UAV network environment. The proposed scheme can reduce the training time appropriately to obtain better allocation results according to the actual demand.