About Me
I’m Yuxi /’jʊ’ʃiː/. I completed my Ph.D. at WING, National University of Singapore, advised by Prof. Min-Yen Kan.
My research focuses on reinforcement learning for reasoning, with a central interest in the co-evolution of policy and evaluator. I study how reward signals shape reasoning behavior in language and multimodal models, why current self-improvement systems converge to static evaluation criteria, and how reward hacking emerges as a structural consequence of this asymmetry. A related thread examines the limits of self-evaluation: when verification signals come from the policy itself, calibration ceilings and coherence-correctness gaps constrain what closed-loop training can achieve. My thesis Closed-Loop Scaling develops one synthesis of these questions through the lens of inference-time search, training-time alignment, and architectural grounding.
I visited the Natural Language Processing Group, University of California, Santa Barbara (UCSB) in AY 2024/2025, advised by Prof. William Yang Wang. Before NUS, I worked with Prof. Yansong Feng at Peking University’s Wangxuan Institute of Computer Technology, and received my B.S. in Data Science and Big Data Technology from the School of Electronics Engineering and Computer Science at Peking University.
My full name written in Chinese is 谢雨汐.
Contact: xieyuxi [at] u.nus.edu or yuxi.sigrid.xie [at] gmail.com
