Chenyu Zhu

A collection of my research work.

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment
arXiv:2605.10983 / 2026
We propose TMPO, a trajectory-level reward distribution matching framework for diffusion and flow model alignment. It replaces scalar reward maximization with a Softmax Trajectory Balance objective, preserves coverage over acceptable generation trajectories, and accelerates multi-trajectory training with Dynamic Stochastic Tree Sampling; across human preference, compositional generation, and text rendering tasks, TMPO improves generative diversity by 9.1% while maintaining competitive reward and efficiency. This work extends my exploration of unified intelligence from structured visual manipulation to distribution-aware diffusion post-training—aligning generative agents with preference signals while preserving diverse, plausible visual worlds.
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing
ACL 2026 main, arXiv:2601.03741 / 2026
We propose I2E, a novel "Decompose-then-Action" paradigm that transforms unstructured images into discrete, manipulable object layers and employs a physics-aware Vision-Language-Action Agent to parse complex instructions into atomic actions via Chain-of-Thought reasoning, significantly outperforming state-of-the-art methods on compositional editing tasks. 🌟This work is a tiny step toward my vision of unified intelligence — guiding agents to reason about inter-layer relationships and manipulate objects within a structured visual environment under real-world physical constraints.