I am a first-year Ph.D. student at Electrical and Computer Engineering Department, Princeton University, advised by Prof. Chi Jin.
Previously, I did my undergraduate at Yuanpei College, Peking University.
I am interested in the intersection of RL and LLMs, especially on certifiable reasoning.
I view exploration as the core challenge in RL, and test-time search being necessary to achieve it for LLM agents.
The key technical problem is how search procedures and expert decision-making systems can be internalized as reasoning ability, rather than remaining external scaffolding.
I see two tightly coupled aspects: backfilling expert search behavior into the model through learning, and on-the-fly calibration that lets the model assess uncertainty and decide when to search, explore, or trust its own prediction.
Publications
LeAct: Learning to Reason from Expert Actions Ziran Yang,
Chengshuai Shi,
Raj Ghugare,
Benjamin Eysenbach,
Karthik Narasimhan,
Chi Jin Under review Distilling certified expert action systems (game solvers, classical planners, theorem provers) into LLM chain-of-thought.