Hi there! I’m currently a Ph.D. candidate at the University of Sydney, working under the guidance of Prof. Wanli Ouyang and Prof. Zhiyong Wang. I’m also an incoming visiting research fellow at Oxford, supervised by Prof. Philip Torr. Previously, I was a rising star research fellow at the Shanghai AI Lab selected by Prof. Xiaoou Tang, where I collaborated with outstanding researchers like Dr. Amanda Shao. Before starting my Ph.D., I was part of SenseTime’s AGI group, working closely with Dr. Junjie Yan. I earned my bachelor’s degree from HUST, where I had the honor of being the ACM-ICPC team captain, guided by Prof. Kun He.

News

  • I’m on the job market in 2025. Curriculum Vitae
  • To junior students seeking advice on early academic careers: If you’d like to chat about your career, research ideas, or potential collaborations, feel free to email me to schedule a meeting. I’d be happy to help recommend some opportunities for internships or studies as well.
  • 2024.12: I gave a talk at the NeurIPS 2024 Workshop on Open-World Agents, titled “Building AI Society with Foundation-Model Agents.”
  • 2024.11: Thrilled to announce OASIS, a simulation platform supporting interactions among over one million LLM agents.
  • 2024.07: I co-organized the ICML 2024 workshop on Multi-modal Foundation Models Meet Embodied AI (MFM-EAI).
  • 2024.07: I co-organized the ICML 2024 workshop on Trustworthy Multi-modal Foundation Models and AI Agents (TiFA).
  • 2024.05: I co-hosted the EgoPlan Challenge to evaluate embodied agents’ complex planning capabilities.
  • 2023.11: Excited to release LAMM, a comprehensive framework for VLM training, evaluation, and applications in embodied agents.
  • 2023.08: I began organizing a weekly academic talk series, Echo AI Talk, inviting young researchers from around the world who are well-known for their work in generative AI and foundation models. Everyone is welcome to join!
  • 2021.11: Excited to release Intern, a series of multi-modal foundation models focusing on visual representation learning.
  • 2020.07: Achieved Rank 4 of 2265 in Meta’s DFDC competition, which focused on identifying videos with facial or voice manipulations. Our solution is open-sourced.
  • 2018.05: As a student coach, I led a team to the ACM-ICPC World Finals, achieving 31st place.

Research Highlights

My research is driven by the ambition to develop AI agents capable of operating in both physical and virtual environments. To address this challenge, my work focuses on leveraging generative AI and is centered around two key areas. The first is multi-modal foundation models, encompassing topics such as multi-modal representation learning, architecture design, and multi-sensory alignment. The second is the systematic agents, with an emphasis on practical applications, including but not limited to embodied agents, multi-agent systems, and large-scale simulations.

Selected Publications

Topics: Multi-Modal Representation Learning / Multi-Agent System / Embodied AI

(*: indicates equal contribution; ‡: indicates corresponding; †: indicates project lead)

Preprint
sym

Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review

Rui Ye*, Xianghe Pang*, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen

Preprint, 2024

Paper | Project Page

NeurIPS-W 2024
sym

OASIS: Open Agents Social Interaction Simulations on One Million Agents

Ziyi Yang*, Zaibin Zhang*, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao

NeurIPS Workshop on Open-World Agents, 2024

Paper | Project Page | Code

Preprint
sym

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao, Lei Bai, Wanli Ouyang, Ruimao Zhang

Preprint, 2024

Paper | Project Page

Preprint
sym

Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation

Haoyang Su*, Renqi Chen*, Shixiang Tang, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong

Preprint, 2024

Paper | Project Page

Preprint
sym

GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

Preprint, 2024

Paper

Preprint
sym

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Yongting Zhang*, Lu Chen*, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

Preprint, 2024

Paper | Code

NeurIPS-W 2024
sym

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Zeren Chen*, Zhelun Shi*, Xiaoya Lu*, Lehan He*, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

NeurIPS Workshop on Open-World Agents, 2024

Paper | Project Page

Preprint
sym

Assessment of Multimodal Large Language Models in Alignment with Human Values

Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Preprint, 2024

Paper | Project Page | Code

NeurIPS-W 2024
sym

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Enshen Zhou*, Yiran Qin*, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

NeurIPS Workshop on Open-World Agents, 2024

Paper | Project Page | Code

ACL 2024
sym

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

Chen Qian*, Jie Zhang*, Wei Yao*, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

The 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Paper | Code

Tech. Report
sym

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Technical Report, 2024

Paper

ECCV 2024
sym

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue, Chao Dong

The 18th European Conference on Computer Vision, ECCV 2024

Paper | Project Page | Code

CVPR 2024
sym

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

Paper | Project Page | Code

Tech. Report
sym

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Technical Report, 2024

Paper | Code

ICLR 2024
sym

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

Zeren Chen*, Ziqin Wang*, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Jing Shao

The Twelfth International Conference on Learning Representations, ICLR 2024

Paper | Code

NeurIPS 2023
sym

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Zhenfei Yin*, Jiong Wang*, Jianjian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Xiaoshui Huang, Zhiyong Wang, Lu Sheng, Lei Bai, Jing Shao, Wanli Ouyang

The Thirty-Seventh Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2023

Paper | Project Page | Code

ICME 2024
sym

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images

Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang

IEEE International Conference on Multimedia and Expo, 2024

Paper

ECCV 2022
sym

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms

Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

European Conference on Computer Vision, 2022

Paper | Project Page | Code

ECCV 2022
sym

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Yinan He*, Gengshi Huang*, Siyu Chen*, Jianing Teng*, Kun Wang, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

European Conference on Computer Vision, 2022

Paper

IJCV
sym

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

Yuanhan Zhang*, Qinghong Sun*, Yichun Zhou*, Zexin He*, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

International Journal of Computer Vision, 2022

Paper | Code

Tech. Report
sym

One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

Yujie Wang*, Junqin Huang*, Mengya Gao*, Yichao Wu*, Zhenfei Yin, Ding Liang, Junjie Yan

Technical Report, 2021

Paper

Tech. Report
sym

INTERN: A New Learning Paradigm Towards General Vision

Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Technical Report, 2021

Paper | Code

Professional Service