Chenyu Zhu

Chenyu Zhu

Undergraduate Student

Huazhong University of Science and Technology

Research Interests

Unified Multimodal Models
World Models
MLLMs
Diffusion Post Training

About

I'm a third year undergraduate at HuaZhong University of Science and Technology (HUST) HUST, driven by a simple belief: No risk, full push! πŸš€

I believe that general intelligence demands more than reasoning alone β€” it requires the ability to see, imagine, and create. Language, perception, and generation are not isolated capabilities but deeply intertwined facets of a unified mind. My research explores this vision at the intersection of 🧠 Unified Multimodal Models, πŸ‘€ MLLMs, 🎨 Diffusion RL, and 🌐 World Models, working toward AI systems that jointly reason and imagine across both digital and physical worlds. I believe unified intelligence is the path to general intelligence (AGI).

I'm always open to discussions and collaborations β€” feel free to reach out! πŸ’¬

News

2026-03

Current work on a new framework for Diffusion RL under progress!!!

2026-01

πŸŽ‰πŸŽ‰ I2E, my first collaborative work with fellows from Tsinghua University and Shanghai AI Lab, has been submitted to ACL 2026!

Selected Publications

View All β†’
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing
Image Editing

I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing

Jinghan Yu1*, Junhao Xiao1*, Chenyu Zhu1, Jiaming Li1, Jia Li1, HanMing Deng1, Xirui Wang1, Guoli Jia2, Jianjun Li1, Zhiyuan Ma1†, Xiang Bai1, Bowen Zhou2,3

1Huazhong University of Science and Technology, 2Tsinghua University, 3Shanghai AI Laboratory

ACL 2026 Under Review, arXiv:2601.03741

Β Β Β Β We propose I2E, a novel "Decompose-then-Action" paradigm that transforms unstructured images into discrete, manipulable object layers and employs a physics-aware Vision-Language-Action Agent to parse complex instructions into atomic actions via Chain-of-Thought reasoning, significantly outperforming state-of-the-art methods on compositional editing tasks.
    🌟This work is a tiny step toward my vision of unified intelligence β€” guiding agents to reason about inter-layer relationships and manipulate objects within a structured visual environment under real-world physical constraints.

Educations

  • 2023.09 - 2027.06 (now), Undergraduate Student, School of EIC, HuaZhong University of Science and Technology (HUST), Wuhan, China
  • 2020.09 - 2023.06, Senior High School Student, Suzhou High School of JiangSu Province, Suzhou, China

Internships

  • Actively seeking internships in the field of MLLM and AIGC.