Hi there! I’m currently a Ph.D. candidate at the University of Sydney, working under the guidance of Prof. Wanli Ouyang and Prof. Zhiyong Wang. I’m also an incoming visiting research fellow at Oxford, supervised by Prof. Philip Torr. Previously, I was a rising star research fellow at the Shanghai AI Lab selected by Prof. Xiaoou Tang, where I collaborated with outstanding researchers like Dr. Amanda Shao. Before starting my Ph.D., I was part of SenseTime’s AGI group, working closely with Dr. Junjie Yan. I earned my bachelor’s degree from HUST, where I had the honor of being the ACM-ICPC team captain, guided by Prof. Kun He.
News
- I’m on the job market in 2025. Curriculum Vitae
- To junior students seeking advice on early academic careers: If you’d like to chat about your career, research ideas, or potential collaborations, feel free to email me to schedule a meeting. I’d be happy to help recommend some opportunities for internships or studies as well.
- 2024.12: I gave a talk at the NeurIPS 2024 Workshop on Open-World Agents, titled “Building AI Society with Foundation-Model Agents.”
- 2024.11: Thrilled to announce OASIS, a simulation platform supporting interactions among over one million LLM agents.
- 2024.07: I co-organized the ICML 2024 workshop on Multi-modal Foundation Models Meet Embodied AI (MFM-EAI).
- 2024.07: I co-organized the ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA).
- 2024.05: I co-hosted the EgoPlan Challenge to evaluate embodied agents’ complex planning capabilities.
- 2023.11: Excited to release LAMM, a comprehensive framework for VLM training, evaluation, and applications in embodied agents.
- 2023.08: I began organizing a weekly academic talk series, Echo AI Talk, inviting young researchers from around the world who are well-known for their work in generative AI and foundation models. Everyone is welcome to join!
- 2021.11: Excited to release Intern, a series of multi-modal foundation models focusing on visual representation learning.
- 2020.07: Achieved Rank 4 of 2265 in Meta’s DFDC competition, which focused on identifying videos with facial or voice manipulations. Our solution is open-sourced.
- 2018.05: As a student coach, I led a team to the ACM-ICPC World Finals, achieving 31st place.
Research Highlights
My research is driven by the ambition to develop AI agents capable of operating in both physical and virtual environments. To address this challenge, my work focuses on leveraging generative AI and is centered around two key areas. The first is multi-modal foundation models, encompassing topics such as multi-modal representation learning, architecture design, and multi-sensory alignment. The second is the systematic agents, with an emphasis on practical applications, including but not limited to embodied agents, multi-agent systems, and large-scale simulations.
Selected Publications
Topics: Multi-Modal Representation Learning / Multi-Agent System / Embodied AI
(*: indicates equal contribution; ‡: indicates corresponding; †: indicates project lead)

Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
Rui Ye*, Xianghe Pang*, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen‡
Preprint, 2024
OASIS: Open Agents Social Interaction Simulations on One Million Agents
Ziyi Yang*, Zaibin Zhang*, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin‡, Guohao Li‡, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao‡
NeurIPS Workshop on Open-World Agents, 2024
Paper | Project Page | Code
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin†, Xihui Liu, Lu Sheng, Jing Shao‡, Lei Bai‡, Wanli Ouyang, Ruimao Zhang‡
Preprint, 2024
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation
Haoyang Su*, Renqi Chen*, Shixiang Tang‡, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong‡
Preprint, 2024

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu‡, Dacheng Tao
Preprint, 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen*, Zhelun Shi*, Xiaoya Lu*, Lehan He*, Sucheng Qian, Hao Shu Fang, Zhenfei Yin†, Wanli Ouyang, Jing Shao‡, Yu Qiao, Cewu Lu, Lu Sheng‡

Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng‡, Yu Qiao, Jing Shao‡
Preprint, 2024
Paper | Project Page | Code

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Enshen Zhou*, Yiran Qin*, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang‡, Lu Sheng‡, Yu Qiao, Jing Shao†
NeurIPS Workshop on Open-World Agents, 2024
Paper | Project Page | Code
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu‡, Jing Shao‡
The 62nd Annual Meeting of the Association for Computational Linguistics, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao‡, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao‡, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin†, Zhipin Wang
Technical Report, 2024

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue‡, Chao Dong‡
The 18th European Conference on Computer Vision, ECCV 2024
Paper | Project Page | Code

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng‡, Ruimao Zhang‡, Yu Qiao, Jing Shao†
The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Paper | Project Page | Code


Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Zeren Chen*, Ziqin Wang*, Zhen Wang, Huayang Liu, Zhenfei Yin†, Si Liu, Lu Sheng‡, Wanli Ouyang, Jing Shao‡
The Twelfth International Conference on Learning Representations, ICLR 2024

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhenfei Yin*, Jiong Wang*, Jianjian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang, Lu Sheng, Lei Bai‡, Jing Shao‡, Wanli Ouyang
The Thirty-Seventh Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2023
Paper | Project Page | Code
3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images
Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang‡

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms
Yuanhan Zhang, Zhenfei Yin†, Jing Shao‡, Ziwei Liu
European Conference on Computer Vision, 2022
Paper | Project Page | Code
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He*, Gengshi Huang*, Siyu Chen*, Jianing Teng*, Kun Wang, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao‡


One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data
Yujie Wang*, Junqin Huang*, Mengya Gao*, Yichao Wu*, Zhenfei Yin, Ding Liang, Junjie Yan
Technical Report, 2021

INTERN: A New Learning Paradigm Towards General Vision
Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao‡
Technical Report, 2021
Professional Service
- 2023.08-Present, Academic-Talk Event Organizer, Echo AI Talk
- 2024.07, Workshop Organizer, ICML 2024 workshop on Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
- 2024.07, Workshop Organizer, ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)
- 2024 Spring, Guest Lecture, ELEC5304: Intelligent Visual Signal Understanding, USYD
- 2024 Spring, Teaching Assistant, COMP 5425: Multimedia Retrieval, USYD
- Peer Review and Program Committee, ICLR, NeurIPS, ICML, AAAI, ECCV, CVPR, and TPAMI