Runtao Liu

I am a Ph.D. candidate in Hong Kong University of Science and Technology (HKUST), advised by Prof. Qifeng Chen. I am also a Visiting Student at the University of Oxford in the Torr Vision Group, working with Prof. Philip Torr. Previously, I was a Ph.D. student at Johns Hopkins University (2021-2022) advised by Prof. Alan Yuille. I received my Master's degree from Peking University (2020) and Bachelor's degree from Beijing University of Posts and Telecommunications (2017). I have also spent wonderful time as a research intern at Disney Research Zürich working with Dr. Yang Zhang, an assistant researcher at Microsoft Research Asia working with Dr. Zhirong Wu and Dr. Steve Lin, and a visiting student at UC Berkeley working with Prof. Stella Yu and Johns Hopkins University working with Prof. Alan Yuille.

My research interests include Generative AI, Reinforcement Learning, Agents, and Visual Generation/Understanding:

Agent/RL for GenAI: LongVideoAgent, VideoDPO, SafetyDPO, Bootstrapped Preference Optimization
Visual understanding and generation: Clevr-Ref+, Appearance Motion Decomposition, Unsupervised Sketch-to-Photo Synthesis
GenAI Safety & Robustness: Robust-R1, SafetyDPO, LatentGuard

News

I am seeking industrial and academic full-time researcher positions in 2026. Please feel free to contact me if you are interested in my research.

Dec. 24, 2025: LongVideoAgent is released on arXiv.

Research Experience

University of Oxford, Visiting Student (June 2023 - Present)
Advisor: Prof. Philip Torr
Disney Research, Zürich, Research Intern (June 2025 - Sep 2025)
Advisor: Dr. Yang Zhang
Microsoft Research Asia, AI Residency (Aug 2020 - Aug 2021)
Mentors: Dr. Zhirong Wu, Dr. Steve Lin
University of California, Berkeley, Visiting Student (Summer 2019)
Advisor: Prof. Stella Yu
Johns Hopkins University, Visiting Student (Summer 2018)
Advisor: Prof. Alan Yuille

Selected Research

GenAI RL/Reasoning/Agent

LongVideoAgent: Multi-Agent Reasoning with Long Videos
R Liu, Z Liu, J Tang, Y Ma, R Pi, J Zhang, Q Chen
arXiv 2025, under review
Fake it till You Make it: Reward Modeling as Discriminative Prediction
R Liu, J Zhan, Y He, C Wei, Alan Yuille, Qifeng Chen
arXiv 2025, under review
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
R Liu, H Wu, Z Ziqiang, C Wei, Y He, R Pi, Qifeng Chen
CVPR 2025
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
R Liu, IC Chen, J Gu, J Zhang, R Pi, Q Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati
ICCV 2025
Utmath: Math evaluation with unit test via reasoning-to-coding thoughts
B Yang, Q Yang, Y Ma, R Liu
EMNLP 2025 Findings
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Z Mi, KC Wang, G Qian, H Ye, R Liu, S Tulyakov, K Aberman, Dan Xu
ICML 2025 (Poster)
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, Tong Zhang
ECCV 2024 (Oral)
Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
R Pi, K Miao, P Li, R Liu, J Gao, J Zhang, X Zhou
EMNLP 2025 Main
VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training
J Zhang, K Miao, R Pi, Z Wang, R Liu, R Pan, Tong Zhang
arXiv 2025, under review
LLMs Meet Multimodal Generation and Editing: A Survey
Y. He, Z. Liu, J. Chen, Z. Tian, H. Liu, X. Chi, R Liu, ..., Qifeng Chen
arXiv 2024, in progress

Visual Generation and Understanding

Latent Guard: a Safety Framework for Text-to-image Generation
R Liu, A Khakzar, J Gu, Q Chen, Philip Torr, Fabio Pizzati
ECCV 2024 (Poster)
Unsupervised Sketch-to-Photo Synthesis
R Liu, Q Yu, Stella Yu
ECCV 2020 (Oral)
The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos
R Liu, Z Wu, Stella Yu, Steve Lin
NeurIPS 2021
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
R Liu, C Liu, Y Bai, Alan Yuille
CVPR 2019
Automatic Document Metadata Extraction Based on Deep Networks
R Liu, L Gao, D An, Z Jiang, Z Tang
NLPCC 2017 (Oral)
ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement
Z Rao, L Ji, Y Xing, R Liu, Z Liu, J Xie, Z Peng, Y He, Qifeng Chen
arXiv 2024, under review
SketchInverter: Multi-Class Sketch-Based Image Generation via GAN Inversion
J Yu, Z An, R Liu, C Wang, Qian Yu
WACV 2023 (Poster)
3D Shape Reconstruction from Free-Hand Sketches
J Wang, J Lin, Q Yu, R Liu, Y Chen, Stella Yu
ECCV Workshop 2022

Selected Awards