Hi there! I’m currently a Ph.D. candidate at the University of Sydney, working under the guidance of Prof. Wanli Ouyang and Prof. Zhiyong Wang. I’m also a visiting research fellow at Oxford, supervised by Prof. Philip Torr. Previously, I was a rising star research fellow at the Shanghai AI Lab selected by Prof. Xiaoou Tang, where I collaborated with outstanding researchers like Dr. Lei Bai, and Dr. Amanda Shao. I also had a wonderful time as a visitor at the Chinese University of Hong Kong. Before starting my Ph.D., I was part of SenseTime’s AGI group, working closely with Dr. Junjie Yan. I earned my bachelor’s degree from HUST, where I had the honor of being the ACM-ICPC team captain, guided by Prof. Kun He.
News
- I’m on the job market in 2025. Curriculum Vitae
- To junior students seeking advice on early academic careers: If you’d like to chat about your career, research ideas, or potential collaborations, feel free to email me to schedule a meeting. I’d be happy to help recommend some opportunities for internships or studies as well.
- 2025.07: I’m organizing the ICML 2025 workshop on Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges, and Futures (MAS-2025), logistics are not fixed yet, welcome to reach out for submissions, keynotes, social events or fundings!
- 2025.04: Thrilled to release MARS (Multi-Agent Robotics System), an open-source framework focusing on embodied intelligence in multi-agent settings. MARS aims to support almost all approaches based on foundation model embodied agents, spatial intelligence, and compositional intelligence (generalization and constraints). You’re welcome to follow and contribute!
- 2025.04: Excited to announce MASWorks/MASLab (a nod to MathWorks/Matlab!), an open-source framework dedicated to multi-agent systems based on LLM agents, providing all essential components for MAS research—datasets, benchmarks, codebases, and more. We’ll also be releasing a series of new research projects based on this platform. Join us in building the community!
- 2024.12: I gave a talk at the NeurIPS 2024 Workshop on Open-World Agents, titled “Building AI Society with Foundation-Model Agents.”
- 2024.11: Thrilled to announce OASIS, a simulation platform supporting interactions among over one million LLM agents.
- 2024.07: I organized the ICML 2024 workshop on Multi-modal Foundation Models Meet Embodied AI (MFM-EAI).
- 2024.07: I organized the ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA).
- 2024.05: I co-hosted the EgoPlan Challenge to evaluate embodied agents’ complex planning capabilities.
- 2023.11: Excited to release LAMM, a comprehensive framework for VLM training, evaluation, and applications in embodied agents.
- 2023.08: I began organizing a weekly academic talk series, Echo AI Talk, inviting young researchers from around the world who are well-known for their work in generative AI, foundation models, and AI agents. Everyone is welcome to join!
- 2021.11: Excited to release Intern, a series of multi-modal foundation models focusing on visual representation learning.
- 2020.07: Achieved Rank 4 of 2265 in Meta’s DFDC competition, which focused on identifying videos with facial or voice manipulations. Our solution is open-sourced.
- 2018.05: As a student coach, I led a team to the ACM-ICPC World Finals, achieving 31st place.
Research Highlights
My research is driven by the ambition to develop AI agents capable of operating in both physical and virtual environments. To address this challenge, my work focuses on leveraging generative AI and is centered around two key areas. The first is multi-modal foundation models, encompassing topics such as multi-modal representation learning, architecture design, and multi-sensory alignment. The second is the systematic agents, with an emphasis on practical applications, including but not limited to embodied agents, multi-agent systems, and large-scale simulations.
Selected Publications
Topics: Multi-Modal Representation Learning / Multi-Agent System / Embodied AI
(*: indicates equal contribution; ‡: indicates corresponding; †: indicates project lead)

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin‡, Xiaohong Liu, Xihui Liu, Ruimao Zhang‡, Lei Bai‡
Preprint, 2025


An AI researchers’ perspective: At the crossroad of LLMs, agent-based modeling, and complex systems: Comment on “LLMs and generative agent-based models for complex systems research
Siyue Ren, Ziyue Gan, Zhenfei Yin, Jing Shao, Shuyue Hu
Physics of Life Reviews 53, 215-217, 2025

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin‡, Lei Bai‡, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia‡
Preprint, 2025

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen‡, Jing Shao‡
Forty-Second International Conference on Machine Learning, ICML 2025 ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
Rui Ye*, Xianghe Pang*, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen‡
Preprint, 2024
OASIS: Open Agents Social Interaction Simulations on One Million Agents
Ziyi Yang*, Zaibin Zhang*, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin‡, Guohao Li‡, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao‡
NeurIPS Workshop on Open-World Agents, 2024
Paper | Project Page | Code
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin†, Xihui Liu, Lu Sheng, Jing Shao‡, Lei Bai‡, Wanli Ouyang, Ruimao Zhang‡
Forty-Second International Conference on Machine Learning, ICML 2025
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation
Haoyang Su*, Renqi Chen*, Shixiang Tang‡, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong‡
Preprint, 2024

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu‡, Dacheng Tao
Preprint, 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang*, Lu Chen*, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui‡, Jing Shao‡
The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen*, Zhelun Shi*, Xiaoya Lu*, Lehan He*, Sucheng Qian, Hao Shu Fang, Zhenfei Yin†, Wanli Ouyang, Jing Shao‡, Yu Qiao, Cewu Lu, Lu Sheng‡

Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng‡, Yu Qiao, Jing Shao‡
Preprint, 2024
Paper | Project Page | Code

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Enshen Zhou*, Yiran Qin*, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang‡, Lu Sheng‡, Yu Qiao, Jing Shao†
NeurIPS Workshop on Open-World Agents, 2024
Paper | Project Page | Code
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu‡, Jing Shao‡
The 62nd Annual Meeting of the Association for Computational Linguistics, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao‡, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao‡, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin†, Zhipin Wang
Technical Report, 2024

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue‡, Chao Dong‡
The 18th European Conference on Computer Vision, ECCV 2024
Paper | Project Page | Code

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng‡, Ruimao Zhang‡, Yu Qiao, Jing Shao†
The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Paper | Project Page | Code


Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Zeren Chen*, Ziqin Wang*, Zhen Wang, Huayang Liu, Zhenfei Yin†, Si Liu, Lu Sheng‡, Wanli Ouyang, Jing Shao‡
The Twelfth International Conference on Learning Representations, ICLR 2024

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhenfei Yin*, Jiong Wang*, Jianjian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang, Lu Sheng, Lei Bai‡, Jing Shao‡, Wanli Ouyang
The Thirty-Seventh Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2023
Paper | Project Page | Code
3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images
Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang‡

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms
Yuanhan Zhang, Zhenfei Yin†, Jing Shao‡, Ziwei Liu
European Conference on Computer Vision, 2022
Paper | Project Page | Code
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He*, Gengshi Huang*, Siyu Chen*, Jianing Teng*, Kun Wang, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao‡


One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data
Yujie Wang*, Junqin Huang*, Mengya Gao*, Yichao Wu*, Zhenfei Yin, Ding Liang, Junjie Yan
Technical Report, 2021

INTERN: A New Learning Paradigm Towards General Vision
Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao‡
Technical Report, 2021
Professional Service
- 2023.08-Present, Academic-Talk Event Organizer, Echo AI Talk
- 2024.07, Workshop Organizer, ICML 2024 workshop on Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
- 2024.07, Workshop Organizer, ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
- 2024 Spring, Guest Lecture, ELEC5304: Intelligent Visual Signal Understanding, USYD
- 2024 Spring, Teaching Assistant, COMP 5425: Multimedia Retrieval, USYD
- Peer Review and Program Committee, ICLR, NeurIPS, ICML, ARR, AAAI, ICCV, ECCV, CVPR, ACMMM, and TPAMI