Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Published in arxiv, 2024. https://arxiv.org/pdf/2412.18279.
Recommended citation: Jiacai Liu and Chaojie Wang and Chris Yuhao Liu and Liang Zeng and Rui Yan and Yiwen Sun and Yang Liu and Yahui Zhou. arXiv:2412.18279, 2024.