Deep reinforcement learning for weakly coupled MDP's with continuous actions

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the chal- lenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolu- tion for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA’s robust- ness and efficiency in managing resource allocation while maximizing rewards.

Mots clés

Computing methodologies → Sequential decision making Machine learning Reinforcement Learning Lagrangian relaxation Markov Decision Problem Weakly Coupled MDP Continuous Actions Lagrange Policy Neural Networks Differential Evolution Resource Allocation Policy Optimization

Domaines

Système multi-agents [cs.MA] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Système multi-agents [cs.MA]

Fichier principal

Deep reinforcement learning for weakly coupled MDPs with continuous actions.pdf (388.65 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Francisco Robledo : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04594762

Soumis le : mardi 11 juin 2024-09:34:34

Dernière modification le : mardi 3 décembre 2024-03:18:30

Dates et versions

hal-04594762 , version 1 (30-05-2024)

hal-04594762 , version 2 (11-06-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04594762 , version 2
ARXIV : 2406.01099

Citer

Francisco Robledo, Urtzi Ayesta, Konstantin Avrachenkov. Deep reinforcement learning for weakly coupled MDP's with continuous actions. ACM SIGMETRICS / ASMTA 2024, Jun 2024, Venise, Italy. ⟨hal-04594762v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INRIA UNIV-PAU LMA-PAU INSMI UT1-CAPITOLE INRIA2 UNIV-COTEDAZUR IRIT IRIT-RMESS ANR IRIT-ASR TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

613 Consultations

76 Téléchargements