Abstract
The domain of robotic manipulation has witnessed significant advancements through the application of deep reinforcement learning, yet substantial challenges remain regarding sample efficiency, generalization, and robustness against environmental perturbations. This paper introduces a novel Hierarchical Deep Reinforcement Learning framework integrated with a Stochastic Policy Gradient mechanism designed specifically to address the high-dimensional state-action spaces inherent in multi-joint robotic control. By decomposing complex manipulation tasks into temporally extended sub-goals managed by a high-level policy, and executing primitive motor commands via a low-level
controller, the proposed architecture effectively mitigates the sparse reward problem. Furthermore, the incorporation of a stochastic policy gradient enables the agent to maintain extensive exploration capabilities while ensuring robust performance in the presence of sensor noise and dynamic friction changes. We demonstrate the efficacy of this approach through rigorous simulation experiments involving complex pick-and-place and stacking tasks. The results indicate that our method significantly outperforms varying state-of the-art baselines in terms of convergence speed and success rates under adversarial conditions.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright (c) 2026 Ha-Eun Yoon , Ji-A Jang (Author)