Research
I'm interested in LLM reasoning and planning, especially exploring how to use reinforcement learning in the post-training stage to improve the model's capability.
|
|
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, GuanWen Qiu, Abulhair Saparov
Preprint, 2026
arxiv /
Create a synthetic logical reasoning environment to study RL scaling and downstream transfer under long-horizon reasoning.
|
|
AnyPrefer: An Automatic Framework for Preference Data Synthesis
Yiyang Zhou*, Zhaoyang Wang*, Tianle Wang*, Shangyu Xing, Peng Xia, Bo Li, Kaiyuan Zheng, Zijian Zhang, Zhaorun Chen, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Weitong Zhang, Ying Wei, Mohit Bansal, Huaxiu Yao
The Thirteenth International Conference on Learning Representations, 2025
arxiv /
Build a framework to synthesize high-quality data for preference training.
|
|
WOT-Class: Weakly Supervised Open-world Text Classification
Tianle Wang, Zihan Wang, Weitang Liu, Jingbo Shang
The Conference on Information and Knowledge Management, 2023
arxiv /
code /
slides /
Propose a novel framework WOT-Class in weakly supervised open-world text classification.
|
|
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches
Zihan Wang*, Tianle Wang*, Dheeraj Mekala, Jingbo Shang
The 61st Annual Meeting of the Association for Computational Linguistics (Findings), 2023
arxiv /
code /
Developed a benchmark to compare extremely weakly supervised text classification methods.
|
|