Some Progress on the Convergence of Policy Gradient Methods

Date:

Abstract: In this talk, I’ll introduce some novel convergence results of the policy gradient method and its variants from my papers. There are three main results I would like to present in this talk: 1. The projected policy gradient converges in a finite number of iterations for any given constant step size. 2. The exact softmax policy gradient converges to the optimum for any given constant step size. 3. $\phi$-update, a class of policy update methods with a policy convergence guarantee that goes beyond the policy gradient descent (or mirror descent) framework.

slides: https://drive.google.com/file/d/1ZM8YQGPM4Gx3s4M_AjYUEVppuo0m25/view