About Me

I am currently a second-year PhD student at Fudan University, and my research interest is Reinforcement Learning. Prior to entering my PhD program, I completed a two-year master’s degree in statistics at Fudan University and earned a bachelor’s degree in management from Shanghai University of Finance and Economics.

Education

Research Interests

My research primarily focuses on the Reinforcement Learning (RL), including: 1. Theoretical analysis and algorithm design in RL. 2. The application of RL in large scale real-world problems. I’m also interested in the combination of RL algorithms with large language models. For those who are interested in working with me, feel free to drop me an email.

News

  • [2025-02] I gave a remote research talk to Csaba Szepesvari and his research group on the global convergence of policy gradient methods. (slides click here).

  • [2025-01] Two papers submitted to ICML 2025 !

  • [2025-01] Our paper “$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante” got accepted to ICLR 2025 !

  • [2025-01] Our paper “On the Linear Convergence of Policy Gradient under Hadamard Parametrization” got accepted to Information and Inference: A Journal of the IMA.

  • [2024-11] We have released our Skywork o1 Open collection on Hugging Face! The collection includes a model with o1-like CoT and two process reward models.

  • [2024-10] Our techinical report Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs has been released on arXiv!

Papers

  • Jiacai Liu, Wenye Li and Ke Wei. On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation. (coming soon …)

  • Jiacai Liu, Chaojie Wang, Chris Yuhao Liu, Liang Zeng, Rui Yan, Yiwen Sun, Yang Liu and Yahui Zhou. Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization. arXiv preprint arXiv:2412.18279, 2024.

  • Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu and Yahui Zhou, Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs, arXiv:2410.18451.

  • Wenye Li *, Jiacai Liu *, Ke Wei. $\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante. In The Thirteen International Conference on Learning Representations, ICLR 2025.

  • Jiacai Liu, Wenye Li and Ke Wei. Elementary Analysis of Policy Gradient Methods. arxiv.2404.03372

  • Jiacai Liu, Wenye Li and Ke Wei. On the Convergence of Projected Policy Gradient for Any Constant Step Sizes, arXiv:2311.01104.

  • Jiacai Liu, Jinchi Chen and Ke Wei. On the linear convergence of policy gradient under Hadamard parameterization Information and Inference: A Journal of the IMA, 2025

Talks

Awards

  • 2023.06, IJCAI 2023 AI Olympics Competition, Champion.
  • 2021.06, Chinese Collegiate Computing Competition, 1st Prize.
  • 2020.06, Chinese Collegiate Computing Competition, 3rd Prize.