Abstract
This study explores LLM-guided reward shaping for multi-agent formation control with collision avoidance. LLMs iteratively generate and refine reward functions based on high-level task objectives and observed failures. Evaluation across 600 simulated and real-world episodesdemonstrates that the approach achieves target formations with 35.2% fewer training iterations and improves collision-free success rates by 28.4% compared with manually designed rewards.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright (c) 2026 Wei Jie Tan , Xiu Fen Li , Jun Hao Ng (Author)