Hi there! I’m currently a postdoctoral researcher at the University of Oxford, supervised by Prof. Philip Torr. I’m also visiting Le Cong and Mengdi Wang’s Lab at Stanford and Princeton. I did my Ph.D. at the University of Sydney, working with Prof. Wanli Ouyang and Prof. Zhiyong Wang. Previously, I was a rising star research fellow at the Shanghai AI Lab selected by Prof. Xiaoou Tang, where I collaborated with outstanding researchers like Dr. Lei Bai, and Dr. Amanda Shao. I also had a wonderful time as a visitor at the Chinese University of Hong Kong. Before starting my Ph.D., I was part of SenseTime’s AGI group, working closely with Dr. Junjie Yan. I earned my bachelor’s degree from HUST, where I had the honor of being the ACM-ICPC team captain, guided by Prof. Kun He.

News

Research Highlights

My research aims to develop general-purpose AI agents capable of operating in both physical and virtual environments. My work primarily focuses on foundation model post-training, multi-agent systems, self-evolving agents, and embodied agents and robotics, with the goal of improving the autonomy, adaptability, and coordination capabilities of intelligent systems in complex environments.

A central application of this research is the development of a General AI Scientist for scientific discovery across both natural sciences and social sciences. These systems leverage AI agents and embodied robotic agents to autonomously explore research problems, generate hypotheses, conduct simulations or experiments, and discover new knowledge. Through this framework, my research aims to uncover new scaling laws of agent-based intelligence and automated scientific discovery in large-scale interactive environments.

Selected Publications

Topics: Foundation Model Agents / Robotics / AI Scientists

(*: indicates equal contribution; ‡: indicates corresponding; †: indicates project lead)

Preprint
sym

Memory in the Age of AI Agents

Yuyang Hu*, Shichun Liu*, Yanwei Yue*, Guibin Zhang*, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu, Jiahao Huo, Junhao Wang, Yuwei Niu, Yu Wang, Zhenfei Yin, Xiaobin Hu, Yue Liao, Qiankun Li, Kun Wang, Wangchunshu Zhou, Yixin Liu, Dawei Cheng, Qi Zhang, Tao Gui, Shirui Pan, Yan Zhang, Philip Torr, Zhicheng Dou, Ji-Rong Wen, Xuanjing Huang, Yu-Gang Jiang, Shuicheng Yan

Preprint 2025

PDF

Preprint
sym

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

Heng Zhou*, Ao Yu*, Yuchen Fan*, Jianing Shi, Li Kang, Hejia Geng, Yongting Zhang, Yutao Fan, Yuhao Wu, Tiancheng He, Yiran Qin, Lei Bai, Zhenfei Yin

Preprint 2025

PDF

Preprint
sym

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying*, Yangguang Shao*, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu

Preprint 2025

PDF

Preprint
sym

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, Lei Bai

Preprint 2025

PDF

Preprint
sym

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Qianshan Wei*, Tengchao Yang*, Yaochen Wang*, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, XiaoFeng Wang

Preprint 2025

PDF

Preprint
sym

LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space

Guibin Zhang, Fanci Meng, Guancheng Wan, Zherui Li, Kun Wang, Zhenfei Yin, Lei Bai, Shuicheng Yan

Preprint 2025

PDF

Preprint
sym

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

Zelin Tan, Hejia Geng, Mulei Zhang, Xiaohang Yu, Guancheng Wan, Yifan Zhou, Qiang He, Xiangyuan Xue, Heng Zhou, Yutao Fan, Zhongzhi Li, Zaibin Zhang, Guibin Zhang, Chen Zhang, Zhenfei Yin, Lei Bai

Preprint 2025

PDF

Preprint
sym

Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Guancheng Wan*, Leixin Sun*, Longxu Dou, Zitong Shi, Fang Wu, Eric Hanchen Jiang, Wenke Huang, Guibin Zhang, Hejia Geng, Xiangru Tang, Zhenfei Yin, Yizhou Sun, Wei Wang

Preprint 2025

PDF

Preprint
sym

Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning

Xiangru Tang*, Wanghan Xu*, Yujie Wang*, Zijie Guo*, Daniel Shao, Jiapeng Chen, Cixuan Zhang, Ziyi Wang, Lixin Zhang, Guancheng Wan, Wenlong Zhang, Lei Bai, Zhenfei Yin, Philip Torr, Hanrui Wang, Di Jin

Preprint 2025

PDF

Preprint
sym

Interleaving Reasoning for Better Text-to-Image Generation

Wenxuan Huang, Shuang Chen, Zheyong Xie, Shaosheng Cao, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, Junbo Qiao, Yue Guo, Yao Hu, Zhenfei Yin, Philip Torr, Yu Cheng, Wanli Ouyang, Shaohui Lin

Preprint 2025

PDF

Preprint
sym

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang*, Hejia Geng*, Xiaohang Yu*, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Michael Littman, Jun Wang, Shuicheng Yan, Philip Torr, Lei Bai

Preprint 2025

PDF

NeurIPS 2025
sym

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Zhiheng Xi*, Guanyu Li*, Yutao Fan*, Honglin Guo*, Yufang Liu, Xiaoran Fan, Jiaqi Liu, Jingchao Ding, Wangmeng Zuo, Zhenfei Yin, Lei Bai, Tao Ji, Tao Gui, Qi Zhang, Philip Torr, Xuanjing Huang

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025

PDF | Project Page | Code

Preprint
sym

When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems

Qibing Ren, Sitao Xie, Longxuan Wei, Zhenfei Yin, Junchi Yan, Lizhuang Ma, Jing Shao

Preprint 2025

PDF | Code

Preprint
sym

VeriGUI: Verifiable Long-Chain GUI Dataset

Shunyu Liu*, Minghao Liu*, Huichi Zhou, Zhenyu Cui, Yang Zhou, Yuhao Zhou, Wendong Fan, Ge Zhang, Jiajun Shi, Weihao Xuan, Jiaxing Huang, Shuang Luo, Fang Wu, Heli Qi, Qingcheng Zeng, Ziqi Ren, Jialiang Gao, Jindi Lv, Junjie Wang, Aosong Feng, Heng Zhou, Wangchunshu Zhou, Zhenfei Yin, Wenlong Zhang, Guohao Li, Wenhao Yu, Irene Li, Lei Ma, Lei Bai, Qunshu Lin, Mingli Song, Dacheng Tao

Preprint 2025

PDF | Code

Preprint
sym

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

Sha Zhang, Suorong Yang, Tong Xie, Xiangyuan Xue, Zixuan Hu, Rui Li, Wenxi Qu, Zhenfei Yin, Tianfan Fu, Di Hu, Andres M Bran, Nian Ran, Bram Hoex, Wangmeng Zuo, Philippe Schwaller, Wanli Ouyang, Lei Bai, Yanyong Zhang, Lingyu Duan, Shixiang Tang, Dongzhan Zhou

Preprint 2025

PDF

NeurIPS 2025
sym

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025

PDF

NeurIPS 2025
sym

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Rui Li*, Zixuan Hu*, Wenxi Qu*, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, Wanli Ouyang, Lei Bai, Wangmeng Zuo, Ling-Yu Duan, Dongzhan Zhou, Shixiang Tang

The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025

PDF

Preprint
sym

X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs

Rui Ye*, Xiangrui Liu*, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen

Preprint 2025

PDF

Preprint
sym

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

Rui Ye, Keduan Huang, Qimin Wu, Yuzhu Cai, Tian Jin, Xianghe Pang, Xiangrui Liu, Jiaqi Su, Chen Qian, Bohan Tang, Kaiqu Liang, Jiaao Chen, Yue Hu, Zhenfei Yin, Rongye Shi, Bo An, Yang Gao, Wenjun Wu, Lei Bai, Siheng Chen

Preprint 2025

PDF

Preprint
sym

AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research

Renqi Chen*, Haoyang Su*, Shixiang Tang, Zhenfei Yin, Qi Wu, Hui Li, Ye Sun, Nanqing Dong, Wanli Ouyang, Philip Torr

Preprint 2025

PDF

ICCV 2025
sym

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai

International Conference on Computer Vision, ICCV 2025

PDF | Project Page

EMNLP 2025
sym

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Heng Zhou*, Hejia Geng*, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin, Lei Bai

Empirical Methods in Natural Language Processing, EMNLP 2025, Oral Presentation, SAC Highlight Award, Outstanding Paper Candidates(Top 1%)

PDF | Code

Physics of Life Reviews 53
sym

An AI researchers’ perspective: At the crossroad of LLMs, agent-based modeling, and complex systems: Comment on “LLMs and generative agent-based models for complex systems research

Siyue Ren, Ziyue Gan, Zhenfei Yin, Jing Shao, Shuyue Hu

Physics of Life Reviews 53, 215-217, 2025

PDF

ICCV 2025
sym

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia

International Conference on Computer Vision, ICCV 2025

PDF | Project Page

ICML 2025
sym

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen, Jing Shao

Forty-Second International Conference on Machine Learning, ICML 2025

ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

PDF

Preprint
sym

Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review

Rui Ye*, Xianghe Pang*, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen

Preprint, 2024

PDF | Project Page

ICCV 2025
sym

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu

International Conference on Computer Vision, ICCV 2025

PDF | Code

NeurIPS-W 2024
sym

OASIS: Open Agents Social Interaction Simulations on One Million Agents

Ziyi Yang*, Zaibin Zhang*, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao

NeurIPS Workshop on Open-World Agents, 2024

PDF | Project Page | Code

ICML 2025
sym

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao, Lei Bai, Wanli Ouyang, Ruimao Zhang

Forty-Second International Conference on Machine Learning, ICML 2025

PDF | Project Page

ACL 2025
sym

Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation

Haoyang Su*, Renqi Chen*, Shixiang Tang, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong

The 63rd Annual Meeting of the Association for Computational Linguistics, Main Conference, ACL 2025

PDF | Project Page

Preprint
sym

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

Preprint, 2024

PDF

CVPR 2025
sym

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Yongting Zhang*, Lu Chen*, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025

PDF | Code

IROS 2025
sym

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Zeren Chen*, Zhelun Shi*, Xiaoya Lu*, Lehan He*, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025

NeurIPS Workshop on Open-World Agents, 2024

PDF | Project Page

Preprint
sym

Assessment of Multimodal Large Language Models in Alignment with Human Values

Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Preprint, 2024

PDF | Project Page | Code

IROS 2025
sym

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Enshen Zhou*, Yiran Qin*, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025

NeurIPS Workshop on Open-World Agents, 2024

PDF | Project Page | Code

ACL 2024
sym

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

The 62nd Annual Meeting of the Association for Computational Linguistics, Findings, ACL 2024

PDF | Code

Tech. Report
sym

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Technical Report, 2024

PDF

ECCV 2024
sym

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue, Chao Dong

The 18th European Conference on Computer Vision, ECCV 2024

PDF | Project Page | Code

CVPR 2024
sym

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PDF | Project Page | Code

Tech. Report
sym

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Technical Report, 2024

PDF | Code

ICLR 2024
sym

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

Zeren Chen*, Ziqin Wang*, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Jing Shao

The Twelfth International Conference on Learning Representations, ICLR 2024

PDF | Code

NeurIPS 2023
sym

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Zhenfei Yin*, Jiong Wang*, Jianjian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang, Lu Sheng, Lei Bai, Jing Shao, Wanli Ouyang

The Thirty-Seventh Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2023

PDF | Project Page | Code

ICME 2024
sym

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images

Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang

IEEE International Conference on Multimedia and Expo, 2024

PDF

ECCV 2022
sym

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms

Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

European Conference on Computer Vision, 2022

PDF | Project Page | Code

ECCV 2022
sym

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Yinan He*, Gengshi Huang*, Siyu Chen*, Jianing Teng*, Kun Wang, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

European Conference on Computer Vision, 2022

PDF

IJCV
sym

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

Yuanhan Zhang*, Qinghong Sun*, Yichun Zhou*, Zexin He*, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

International Journal of Computer Vision 10.1007/s11263-025-02450-2

PDF | Code

Tech. Report
sym

One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Yujie Wang*, Junqin Huang*, Mengya Gao*, Yichao Wu*, Zhenfei Yin, Ding Liang, Junjie Yan

Technical Report, 2021

PDF

Tech. Report
sym

INTERN: A New Learning Paradigm Towards General Vision

Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Technical Report, 2021

PDF | Code

Professional Service