Publications
A collection of my research work.

I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing
Jinghan Yu1*, Junhao Xiao1*, Chenyu Zhu1, Jiaming Li1, Jia Li1, HanMing Deng1, Xirui Wang1, Guoli Jia2, Jianjun Li1, Zhiyuan Ma1†, Xiang Bai1, Bowen Zhou2,3
1Huazhong University of Science and Technology, 2Tsinghua University, 3Shanghai AI Laboratory
ACL 2026 Under Review, arXiv:2601.03741
We propose I2E, a novel "Decompose-then-Action" paradigm that transforms unstructured images into discrete, manipulable object layers and employs a physics-aware Vision-Language-Action Agent to parse complex instructions into atomic actions via Chain-of-Thought reasoning, significantly outperforming state-of-the-art methods on compositional editing tasks.
🌟This work is a tiny step toward my vision of unified intelligence — guiding agents to reason about inter-layer relationships and manipulate objects within a structured visual environment under real-world physical constraints.
Paper
Code