Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systems
Abstract
We consider reinforcement learning control problems under the average reward criterion in which non-zero rewards are both sparse and rare, that is, they occur in very few states and have a very small steady-state probability. Using Renewal Theory and Fleming-Viot particle systems, we propose a novel approach that exploits prior knowledge on the sparse structure of the environment to boost exploration of the non-zero rewards. We also demonstrate how to combine the methodology with a policy gradient algorithm to construct the FVRL method that is able to efficiently solve structured control problems under these scenarios. We provide theoretical guarantees of the convergence of both the steady-state probability estimator and the policy gradient learner. Finally, we illustrate the method on an M/M/1/K queue control problem where the objective is to determine the optimum blocking threshold K. Our results show that FVRL learns the optimum blocking threshold much more efficiently than vanilla Monte-Carlo reinforcement learning.
Fichier principal
2022 - EWRL - Mastropietro, Majewski, Ayesta, Jonckheere - Boosting Reinforcement Learning with Sparse and Rare Rewards with Fleming-Viot Particle Systems.pdf (1 Mo)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|