About Me

I am currently a third-year PhD student at Fudan University, and my research interest is Reinforcement Learning. Prior to entering my PhD program, I completed a two-year master’s degree in statistics at Fudan University and earned a bachelor’s degree in management from Shanghai University of Finance and Economics.

Education

Ph.D in Reinforcement Learning, Fudan University, (2023-2026)
Supervisor : Ke Wei
M.S. in Statistics, Fudan University, (2021 - 2023)
B.M. in Shanghai University of Finance and Economic, (2017 - 2021)
I rank 1st in GaoKao in Guangyuan City, Sichuan Province

Research Interests

My research primarily focuses on the Reinforcement Learning (RL), including: 1. Theoretical analysis and algorithm design in RL. 2. The application of RL in large scale real-world problems. I’m also interested in the combination of RL algorithms with large language models. For those who are interested in working with me, feel free to drop me an email.

News

[2025-09] We wrote a Notion blog “When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch” to discuss the training-inference mismatch issue in RL and potential solutions.
[2025-09] Our paper “DAPO : Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization” has been accepted to the NeurIPS 2025 as a Spotlight!
[2025-07] Our paper “On the Convergence of Projected Policy Gradient for Any Constant Step Sizes” got accepted to JMLR !
[2025-05] We have released the Skywork-OR1 (Open Reasoner 1) series of models, a collection of SOTA 7B and 32B models speialized in math and code. We open-source model weights, data, and training code. We also release a technical report to share detailed training recipes and extensive experimental results, analysis, and insights.
[2025-02] I gave a remote research talk to Csaba Szepesvari and his research group on the global convergence of policy gradient methods. (slides click here).
[2025-01] Our paper “$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarante” got accepted to ICLR 2025 !
[2025-01] Our paper “On the Linear Convergence of Policy Gradient under Hadamard Parametrization” got accepted to Information and Inference: A Journal of the IMA.
[2024-11] We have released our Skywork o1 Open collection on Hugging Face! The collection includes a model with o1-like CoT and two process reward models.
[2024-10] Our techinical report Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs has been released on arXiv!

Publications

When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch (Technical Blog)
Jiacai Liu *, Yingru Li *, Yuqian Fu, Jiawei Wang, Qian Liu and Yu Shen
On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation (Under Review)
Jiacai Liu, Wenye li, and Ke Wei.
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents (Under Review)
Jiawei Wang, Jiacai Liu, Yuqian Fu, Yingru Li, Xintao Wang, Yuan Lin, Yu Yue, Lin Zhang, Yang Wang and Ke Wang
Skywork Open Reasoner 1 Technical Report (Technical Report)

Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, Xiaoyu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu and Yahui Zhou

Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization (NeurIPS 2025 Spotlight)
Jiacai Liu, Chaojie Wang, Chris Yuhao Liu, Liang Zeng, Rui Yan, Yiwen Sun, Yang Liu and Yahui Zhou
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs. (Technical Report)
Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu and Yahui Zhou
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee. (ICLR 2025)
Wenye Li *, Jiacai Liu * and Ke Wei.
Elementary Analysis of Policy Gradient Methods (Under Review)
Jiacai Liu, Wenye Li and Ke Wei.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes (JMLR)
Jiacai Liu, Wenye Li, Dachao Lin, Ke Wei and Zhihua Zhang
On the Linear Convergence of Policy Gradient under Hadamard Parametrization (Information and Inference: A Journal of the IMA)
Jiacai Liu, Jinchi Chen and Ke We

* : Equal Contribution

Talks

Some Progress on the Convergence of Policy Gradient Methods

February 12, 2025

Remote research talk at Csaba Szepesvari's research group, University of Alberta

Projected Policy Gradient Converges in a Finite Number of Iterations

November 30, 2023

Presentation at Applied Math Ph.D. Seminar, Rm1801,Guanghua East Tower

Awards

2023.06, IJCAI 2023 AI Olympics Competition, Champion.
2021.06, Chinese Collegiate Computing Competition, 1st Prize.
2020.06, Chinese Collegiate Computing Competition, 3rd Prize.