Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

On the Linear Convergence of Policy Gradient under Hadamard Parametrization

Published in Information and Inference: A Journal of the IMA, 2023. https://arxiv.org/abs/2305.19575.

Recommended citation: Jiacai Liu, Jinchi Chen, and Ke Wei. On the Linear Convergence of Policy Gradient under Hadamard Parameterization. arXiv:2305.19575, 2023.

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Published in arxiv, 2023. https://arxiv.org/abs/2311.01104.

Recommended citation: Jiacai Liu, Wenye Li, and Ke Wei. On the Convergence of Projected Policy Gradient for Any Constant Step Sizes. arXiv:2311.0110, 2023.

Elementary Analysis of Policy Gradient Methods

Published in arxiv, 2024. https://arxiv.org/abs/2404.03372.

Recommended citation: Jiacai Liu, Wenye Li, and Ke Wei. Elementary Analysis of Policy Gradient Methods. arXiv:2404.03372, 2024.

$phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante

Published in ICLR, 2024. https://openreview.net/pdf?id=fh7GYa7cjO.

Recommended citation: Wenye Li, Jiacai Liu and Ke Wei. $\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante. In The Thirteen International Conference on Learning Representations, ICLR 2025, Singapore.

Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization

Published in arxiv, 2024. https://arxiv.org/pdf/2412.18279.

Recommended citation: Jiacai Liu and Chaojie Wang and Chris Yuhao Liu and Liang Zeng and Rui Yan and Yiwen Sun and Yang Liu and Yahui Zhou. arXiv:2412.18279, 2024.

talks

Projected Policy Gradient Converges in a Finite Number of Iterations

Published: November 30, 2023

Some Progress on the Convergence of Policy Gradient Methods

Published: February 12, 2025

Abstract: In this talk, I’ll introduce some novel convergence results of the policy gradient method and its variants from my papers. There are three main results I would like to present in this talk: 1. The projected policy gradient converges in a finite number of iterations for any given constant step size. 2. The exact softmax policy gradient converges to the optimum for any given constant step size. 3. $\phi$-update, a class of policy update methods with a policy convergence guarantee that goes beyond the policy gradient descent (or mirror descent) framework.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.