Hi there! I’m currently a postdoctoral researcher at the University of Oxford, supervised by Prof. Philip Torr. I’m also visiting Le Cong and Mengdi Wang’s Lab at Stanford and Princeton. I did my Ph.D. at the University of Sydney, working with Prof. Wanli Ouyang and Prof. Zhiyong Wang. Previously, I was a rising star research fellow at the Shanghai AI Lab selected by Prof. Xiaoou Tang, where I collaborated with outstanding researchers like Dr. Lei Bai, and Dr. Amanda Shao. I also had a wonderful time as a visitor at the Chinese University of Hong Kong. Before starting my Ph.D., I was part of SenseTime’s AGI group, working closely with Dr. Junjie Yan. I earned my bachelor’s degree from HUST, where I had the honor of being the ACM-ICPC team captain, guided by Prof. Kun He.
News
- I’m on the academic job market in 2026. Curriculum Vitae
- To junior students seeking advice on early academic careers: If you’d like to chat about your career, research ideas, or potential collaborations, feel free to email me to schedule a meeting. I’d be happy to recommend some internship or study opportunities as well.
- I’m looking for motivated students to work with me on topics such as agentic AI, multi-agent systems, embodied agents, post-training of multi-modal large language models, and related areas. A strong background in these fields is a plus, but not a must; curiosity, commitment, and a willingness to learn matter most. If you’re interested, please email me with your CV and a short note about your research interests.
- 2026.06: I am organizing four CVPR 2026 workshops: 2nd Workshop on Multi-Modal Reasoning for Agentic Intelligence (MMRAgI), ScaleBot: The First Workshop on Scalable Robot Learning Systems, Agentic AI for Visual Media, and the 6th Workshop on Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents.
- 2026.04: I am organizing two ICLR 2026 workshops: Lifelong Agents: Learning, Aligning, Evolving and the First Workshop on Efficient Spatial Reasoning.
- 2026.04: I will give an invited talk on “Agents and World Models in Wet Lab” at the ICLR 2026 workshop on World Models: Understanding, Modelling, and Scaling.
- 2026.04: I will participate in a panel discussion at the ICLR 2026 workshop on AI with Recursive Self-Improvement.
- 2025.10: We are organizing the SFE Challenge, focusing on advancing the capabilities of foundation models as AI Scientist base models. The competition has just begun, and everyone is warmly invited to participate and contribute innovative ideas!
- 2025.08: We are organizing a competition on multi-agent embodied intelligence. All relevant details, datasets, and the platform have been released; please visit the official website for further information.
- 2025.10: I organized the ICCV 2025 workshop on Reliable and Interactable World Models: Geometry, Physics, Interactivity and Real-World Generalization.
- 2025.10: I organized the ICCV 2025 workshop on Multi-Modal Reasoning for Agentic Intelligence.
- 2025.09: Thrilled to release our survey + position paper on LLM Agent Reinforcement Learning, where we systematically define and outline the emerging paradigms of LLM RL and LLM Agent RL. Check it out on Hugging Face, and explore our curated Awesome paper list. If you find it helpful, please consider an upvote or a star!
- 2025.07: Our work VirSci was featured in Nature Feature [PDF]! The concept of Co-AI Scientists is gaining wide attention.
- 2025.07: Our paper “Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review” was covered by The Washington Post [PDF]. The piece discusses the potential impact of LLM-based reviewing on the future of the peer review ecosystem.
- 2025.07: Honored to be selected as a WAIC 2025 Yunfan Award Rising Star nominee, recognizing emerging researchers in the field of AI.
- 2025.07: I organized the ICML 2025 workshop on Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges, and Futures (MAS-2025), which was one of the most well-attended events of the entire conference!
- 2025.04: Thrilled to release MARS (Multi-Agent Robotics System), an open-source framework focusing on embodied intelligence in multi-agent settings. MARS aims to support almost all approaches based on foundation model embodied agents, spatial intelligence, and compositional intelligence (generalization and constraints). You’re welcome to follow and contribute!
- 2025.04: Excited to announce MASWorks/MASLab (a nod to MathWorks/Matlab!), an open-source framework dedicated to multi-agent systems based on LLM agents, providing all essential components for MAS research, datasets, benchmarks, codebases, and more. We’ll also be releasing a series of new research projects based on this platform. Join us in building the community!
- 2024.12: I gave a talk at the NeurIPS 2024 Workshop on Open-World Agents, titled “Building AI Society with Foundation-Model Agents.”
- 2024.11: Thrilled to announce OASIS, a simulation platform supporting interactions among over one million LLM agents.
- 2024.07: I organized the ICML 2024 workshop on Multi-modal Foundation Models Meet Embodied AI (MFM-EAI).
- 2024.07: I organized the ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA).
- 2024.05: I co-hosted the EgoPlan Challenge to evaluate embodied agents’ complex planning capabilities.
- 2023.11: Excited to release LAMM, a comprehensive framework for VLM training, evaluation, and applications in embodied agents.
- 2023.08: I began organizing a weekly academic talk series, Echo AI Talk, inviting young researchers from around the world who are well-known for their work in generative AI, foundation models, and AI agents. Everyone is welcome to join!
- 2021.11: Excited to release Intern, a series of multi-modal foundation models focusing on visual representation learning.
- 2020.07: Achieved Rank 4 of 2265 in Meta’s DFDC competition, which focused on identifying videos with facial or voice manipulations. Our solution is open-sourced.
- 2018.05: As a student coach, I led a team to the ACM-ICPC World Finals, achieving 31st place.
Research Highlights & Profile
Zhenfei (Jeremy) Yin is a postdoctoral researcher at the University of Oxford, supervised by Prof. Philip Torr, and a visiting researcher at Stanford and Princeton. He received his Ph.D. from the University of Sydney. His research focuses on advancing the next generation of AI: systems that can not only understand and generate, but also act, adapt, and drive discovery in the real world. His work spans foundation model agents, multi-agent systems, self-evolving agents, embodied agents and robotics, and AI Scientist systems, with the goal of building general-purpose AI agents that can operate across both physical and virtual worlds and uncover new scaling laws for agent-based intelligence and automated scientific discovery.
Dr. Yin has authored 90+ papers including preprints, with 50+ papers published at top AI conferences and journals, and his work has received 2,000+ citations. He has also contributed to open-source AI projects with 20,000+ GitHub stars in total. Across agentic AI, multi-agent systems, embodied intelligence, and AI scientists, he has built research and open-source efforts that help push AI beyond passive assistance toward execution, continual learning, and innovation. His representative efforts include open platforms and systems for multimodal foundation models, large-scale agent societies, multi-agent systems, and embodied intelligence. His work has also received broader recognition beyond academia, including coverage by Nature and The Washington Post.
Selected Publications
Topics: Foundation Model Agents / Robotics / AI Scientists
(*: indicates equal contribution; ‡: indicates corresponding; †: indicates project lead)
Visit Google Scholar for the complete list of publications.

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
Yujiong Shen, Yajie Yang, Zhiheng Xi, Binze Hu, Huayu Sha, Jiazheng Zhang, Qiyuan Peng, Junlin Shang, Jixuan Huang, Yutao Fan, Jingqi Tong, Shihan Dou, Ming Zhang, Lei Bai, Zhenfei Yin‡, Tao Gui, Xingjun Ma, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang
Preprint 2026

Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning
Lintao Wang, Zhuqiang Lu, Yilin Zhu, Kun Hu, Zhenfei Yin‡, Shixiang Tang, Zhiyong Wang, Wanli Ouyang, Xinzhu Ma
Preprint 2026

TodoEvolve: Learning to Architect Agent Planning Systems
Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin‡, Qibing Ren, Junchi Yan
Preprint 2026

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
Xinwu Ye, Yicheng Mao, Jia Zhang, Yimeng Liu, Li Hao, Fang Wu, Zhiwei Li, Yuxuan Liao, Zehong Wang, Zhiyuan Liu, Zhenfei Yin‡, Li Yuan, Philip Torr, Huan Sun, Xiangxiang Zeng, Mengdi Wang, Le Cong, Shenghua Gao, Xiangru Tang
Preprint 2026

Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
Zeping Li, Guancheng Wan, Keyang Chen, Yu Chen, Yiwen Zhao, Philip Torr, Guangnan Ye, Zhenfei Yin‡, Hongfeng Chai
Findings of the Association for Computational Linguistics, ACL 2026

Vision-deepresearch benchmark: Rethinking visual and textual search for multimodal large language models
Yu Zeng, Wenxuan Huang, Zhen Fang, Shuang Chen, Yufan Shen, Yishuo Cai, Xiaoman Wang, Zhenfei Yin, Lin Chen, Zehui Chen, Shiting Huang, Yiming Zhao, Xu Tang, Yao Hu, Philip Torr, Wanli Ouyang, Shaosheng Cao
Preprint 2026

Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li, Keyang Chen, Yixin Cao, Guangnan Ye, Hongfeng Chai, Zhenfei Yin‡
The 64th Annual Meeting of the Association for Computational Linguistics, Main Conference, ACL 2026

Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
Zehong Wang, Fang Wu, Hongru Wang, Xiangru Tang, Bolian Li, Zhenfei Yin, Yijun Ma, Yiyang Li, Weixiang Sun, Xiusi Chen, Yanfang Ye
Preprint 2026

Vision-deepresearch: Incentivizing deepresearch capability in multimodal large language models
Wenxuan Huang, Yu Zeng, Qiuchen Wang, Zhen Fang, Shaosheng Cao, Zheng Chu, Qingyu Yin, Shuang Chen, Zhenfei Yin, Lin Chen, Zehui Chen, Xu Tang, Yao Hu, Philip Torr, Feng Zhao, Wanli Ouyang
Preprint 2026

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, Haoqin Hong, Zhenfei Yin, Philip Torr, Hao Su, Ruimao Zhang, Daolin Ma
Preprint 2026

Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge
Li Kang, Heng Zhou, Xiufeng Song, Rui Li, Bruno NY Chen, Ziye Wang, Ximeng Meng, Stone Tao, Yiran Qin, Xiaohong Liu, Ruimao Zhang, Lei Bai, Yilun Du, Hao Su, Philip Torr, Zhenfei Yin‡
Technical Report, 2026

Think3D: Thinking with Space for Spatial Reasoning
Zaibin Zhang, Yuhan Wu, Lianjie Jia, Yifan Wang, Zhongbo Zhang, Yijiang Li, Binghao Ran, Fuxi Zhang, Zhuohan Sun, Zhenfei Yin‡, Lijun Wang, Huchuan Lu
Preprint 2026

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu
The First Workshop on Efficient Spatial Reasoning, ICLR 2026, Oral Presentation, Best Paper Award

From Word to World: Can Large Language Models be Implicit Text-based World Models?
Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, Mengdi Wang
The 64th Annual Meeting of the Association for Computational Linguistics, Main Conference, ACL 2026

Memory in the Age of AI Agents
Yuyang Hu*, Shichun Liu*, Yanwei Yue*, Guibin Zhang*, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu, Jiahao Huo, Junhao Wang, Yuwei Niu, Yu Wang, Zhenfei Yin, Xiaobin Hu, Yue Liao, Qiankun Li, Kun Wang, Wangchunshu Zhou, Yixin Liu, Dawei Cheng, Qi Zhang, Tao Gui, Shirui Pan, Yan Zhang, Philip Torr, Zhicheng Dou, Ji-Rong Wen, Xuanjing Huang, Yu-Gang Jiang, Shuicheng Yan
Preprint 2025

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge
Heng Zhou*, Ao Yu*, Yuchen Fan*, Jianing Shi, Li Kang, Hejia Geng, Yongting Zhang, Yutao Fan, Yuhao Wu, Tiancheng He, Yiran Qin, Lei Bai‡, Zhenfei Yin‡
Preprint 2025

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Zonghao Ying*, Yangguang Shao*, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu
Findings of the Association for Computational Linguistics, ACL 2026

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin‡, Philip Torr, Wanli Ouyang, Lei Bai‡
The Fourteenth International Conference on Learning Representations, ICLR 2026

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory
Qianshan Wei*, Tengchao Yang*, Yaochen Wang*, Xinfeng Li‡, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, XiaoFeng Wang
Preprint 2025

LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space
Guibin Zhang, Fanci Meng, Guancheng Wan, Zherui Li, Kun Wang, Zhenfei Yin, Lei Bai, Shuicheng Yan
Preprint 2025

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan, Hejia Geng, Mulei Zhang, Xiaohang Yu, Guancheng Wan, Yifan Zhou, Qiang He, Xiangyuan Xue, Heng Zhou, Yutao Fan, Zhongzhi Li, Zaibin Zhang, Guibin Zhang, Chen Zhang‡, Zhenfei Yin‡, Lei Bai
The 64th Annual Meeting of the Association for Computational Linguistics, Main Conference, ACL 2026, Oral Presentation

Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
Guancheng Wan*, Leixin Sun*, Longxu Dou, Zitong Shi, Fang Wu, Eric Hanchen Jiang, Wenke Huang, Guibin Zhang, Hejia Geng, Xiangru Tang, Zhenfei Yin‡, Yizhou Sun, Wei Wang
Preprint 2025

Eigen-Agent: Adaptive Multi-Agent Scientific Reasoning with Monitor-Based RAG
Xiangru Tang*, Wanghan Xu*, Yujie Wang*, Zijie Guo*, Daniel Shao, Jiapeng Chen, Cixuan Zhang, Ziyi Wang, Lixin Zhang, Guancheng Wan, Wenlong Zhang, Lei Bai, Zhenfei Yin‡, Philip Torr, Hanrui Wang, Di Jin
The Fourteenth International Conference on Learning Representations, ICLR 2026

Interleaving Reasoning for Better Text-to-Image Generation
Wenxuan Huang, Shuang Chen, Zheyong Xie, Shaosheng Cao‡, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, Junbo Qiao, Yue Guo, Yao Hu, Zhenfei Yin‡, Philip Torr, Yu Cheng, Wanli Ouyang, Shaohui Lin‡
The Fourteenth International Conference on Learning Representations, ICLR 2026

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Guibin Zhang*, Hejia Geng*, Xiaohang Yu*, Zhenfei Yin‡, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Michael Littman, Jun Wang, Shuicheng Yan, Philip Torr, Lei Bai‡
Transactions on Machine Learning Research, TMLR 2026

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
Zhiheng Xi*, Guanyu Li*, Yutao Fan*, Honglin Guo*, Yufang Liu, Xiaoran Fan, Jiaqi Liu, Jingchao Ding, Wangmeng Zuo, Zhenfei Yin‡, Lei Bai, Tao Ji, Tao Gui‡, Qi Zhang, Philip Torr, Xuanjing Huang
The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025
PDF | Project Page | Code

VeriGUI: Verifiable Long-Chain GUI Dataset
Shunyu Liu*, Minghao Liu*, Huichi Zhou, Zhenyu Cui, Yang Zhou, Yuhao Zhou, Wendong Fan, Ge Zhang, Jiajun Shi, Weihao Xuan, Jiaxing Huang, Shuang Luo, Fang Wu, Heli Qi, Qingcheng Zeng, Ziqi Ren, Jialiang Gao, Jindi Lv, Junjie Wang, Aosong Feng, Heng Zhou, Wangchunshu Zhou, Zhenfei Yin, Wenlong Zhang, Guohao Li, Wenhao Yu, Irene Li, Lei Ma, Lei Bai, Qunshu Lin, Mingli Song, Dacheng Tao
Preprint 2025

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin‡, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai‡, Zhenfei Yin‡
The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Rui Li*, Zixuan Hu*, Wenxi Qu*, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, Wanli Ouyang, Lei Bai, Wangmeng Zuo, Ling-Yu Duan, Dongzhan Zhou‡, Shixiang Tang‡
The Thirty-Ninth Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2025

X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye*, Xiangrui Liu*, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen‡
Preprint 2025

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
Rui Ye, Keduan Huang, Qimin Wu, Yuzhu Cai, Tian Jin, Xianghe Pang, Xiangrui Liu, Jiaqi Su, Chen Qian, Bohan Tang, Kaiqu Liang, Jiaao Chen, Yue Hu, Zhenfei Yin, Rongye Shi, Bo An, Yang Gao, Wenjun Wu, Lei Bai‡, Siheng Chen‡
Preprint 2025

AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research
Renqi Chen*, Haoyang Su*, Shixiang Tang, Zhenfei Yin, Qi Wu, Hui Li, Ye Sun, Nanqing Dong‡, Wanli Ouyang, Philip Torr
Preprint 2025

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin‡, Xiaohong Liu, Xihui Liu, Ruimao Zhang‡, Lei Bai‡
International Conference on Computer Vision, ICCV 2025

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks
Heng Zhou*, Hejia Geng*, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin‡, Lei Bai‡
Empirical Methods in Natural Language Processing, EMNLP 2025, Oral Presentation, SAC Highlight Award, Outstanding Paper Candidates(Top 1%)

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin‡, Lei Bai‡, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia‡
International Conference on Computer Vision, ICCV 2025

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen‡, Jing Shao‡
Forty-Second International Conference on Machine Learning, ICML 2025
ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
Rui Ye*, Xianghe Pang*, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen‡
Preprint, 2024

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu, Zhenfei Yin‡, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu‡
International Conference on Computer Vision, ICCV 2025
OASIS: Open Agents Social Interaction Simulations on One Million Agents
Ziyi Yang*, Zaibin Zhang*, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin‡, Guohao Li‡, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao‡
NeurIPS Workshop on Open-World Agents, 2024
PDF | Project Page | Code
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin†, Xihui Liu, Lu Sheng, Jing Shao‡, Lei Bai‡, Wanli Ouyang, Ruimao Zhang‡
Forty-Second International Conference on Machine Learning, ICML 2025
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation
Haoyang Su*, Renqi Chen*, Shixiang Tang‡, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong‡
The 63rd Annual Meeting of the Association for Computational Linguistics, Main Conference, ACL 2025

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu‡, Dacheng Tao
Preprint, 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang*, Lu Chen*, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui‡, Jing Shao‡
The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen*, Zhelun Shi*, Xiaoya Lu*, Lehan He*, Sucheng Qian, Hao Shu Fang, Zhenfei Yin†, Wanli Ouyang, Jing Shao‡, Yu Qiao, Cewu Lu, Lu Sheng‡
IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025

Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng‡, Yu Qiao, Jing Shao‡
Preprint, 2024
PDF | Project Page | Code

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Enshen Zhou*, Yiran Qin*, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang‡, Lu Sheng‡, Yu Qiao, Jing Shao†
IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025
NeurIPS Workshop on Open-World Agents, 2024
PDF | Project Page | Code
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu‡, Jing Shao‡
The 62nd Annual Meeting of the Association for Computational Linguistics, Findings, ACL 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao‡, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao‡, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin†, Zhipin Wang
Technical Report, 2024

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue‡, Chao Dong‡
The 18th European Conference on Computer Vision, ECCV 2024
PDF | Project Page | Code

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng‡, Ruimao Zhang‡, Yu Qiao, Jing Shao†
The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PDF | Project Page | Code


Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Zeren Chen*, Ziqin Wang*, Zhen Wang, Huayang Liu, Zhenfei Yin†, Si Liu, Lu Sheng‡, Wanli Ouyang, Jing Shao‡
The Twelfth International Conference on Learning Representations, ICLR 2024

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhenfei Yin*, Jiong Wang*, Jianjian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang, Lu Sheng, Lei Bai‡, Jing Shao‡, Wanli Ouyang
The Thirty-Seventh Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2023
PDF | Project Page | Code
3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images
Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang‡

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms
Yuanhan Zhang, Zhenfei Yin†, Jing Shao‡, Ziwei Liu
European Conference on Computer Vision, 2022
PDF | Project Page | Code
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He*, Gengshi Huang*, Siyu Chen*, Jianing Teng*, Kun Wang, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao‡


INTERN: A New Learning Paradigm Towards General Vision
Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao‡
Technical Report, 2021
Professional Service
- 2023.08-Present, Academic-Talk Event Organizer, Echo AI Talk
- 2024.07, Workshop Organizer, ICML 2024 workshop on Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
- 2024.07, Workshop Organizer, ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
- 2024 Spring, Guest Lecture, ELEC5304: Intelligent Visual Signal Understanding, USYD
- 2024 Spring, Teaching Assistant, COMP 5425: Multimedia Retrieval, USYD
- Peer Review and Program Committee, ICLR, NeurIPS, ICML, ARR, AAAI, ICCV, ECCV, CVPR, ACMMM, and TPAMI