Some Progress on the Convergence of Policy Gradient Methods
Remote research talk, Csaba Szepesvari's research group, University of Alberta
Abstract: In this talk, I’ll introduce some novel convergence results of the policy gradient method and its variants from my papers. There are three main results I would like to present in this talk: 1. The projected policy gradient converges in a finite number of iterations for any given constant step size. 2. The exact softmax policy gradient converges to the optimum for any given constant step size. 3. $\phi$-update, a class of policy update methods with a policy convergence guarantee that goes beyond the policy gradient descent (or mirror descent) framework.