Deep reinforcement learning for weakly coupled MDP's with continuous actions
Abstract
This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the chal- lenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolu- tion for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA’s robust- ness and efficiency in managing resource allocation while maximizing rewards.
Fichier principal
Deep reinforcement learning for weakly coupled MDPs with continuous actions.pdf (388.65 Ko)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|