Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Jiacai Liu and Chaojie Wang and Chris Yuhao Liu and Liang Zeng and Rui Yan and Yiwen Sun and Yang Liu and Yahui Zhou. arXiv:2412.18279, 2024.
Ph.D in Reinforcement Learning, Fudan University, (2023-2027)
Supervisor : Ke Wei
M.S. in Statistics, Fudan University, (2021 - 2023)
B.M. in Shanghai University of Finance and Economic, (2017 - 2021)
I rank 1st in GaoKao in Guangyuan City, Sichuan Province
Jiacai Liu and Chaojie Wang and Chris Yuhao Liu and Liang Zeng and Rui Yan and Yiwen Sun and Yang Liu and Yahui Zhou. arXiv:2412.18279, 2024.
Wenye Li, Jiacai Liu and Ke Wei. $\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante. In The Thirteen International Conference on Learning Representations, ICLR 2025, Singapore.
Jiacai Liu, Wenye Li, and Ke Wei. Elementary Analysis of Policy Gradient Methods. arXiv:2404.03372, 2024.
Jiacai Liu, Wenye Li, and Ke Wei. On the Convergence of Projected Policy Gradient for Any Constant Step Sizes. arXiv:2311.0110, 2023.
Jiacai Liu, Jinchi Chen, and Ke Wei. On the Linear Convergence of Policy Gradient under Hadamard Parameterization. arXiv:2305.19575, 2023.
Remote research talk at Csaba Szepesvari's research group, University of Alberta
Presentation at Applied Math Ph.D. Seminar, Rm1801,Guanghua East Tower