Yucheng Han
I graduated from Tsinghua University in 2021. Now I am a third year Ph.D. student at Nanyang Technological University advised by Professor Hanwang Zhang. Recently I work as an intern advised by Gang Yu.
I mainly focus on computer vision. I can be contacted using email yucheng002@e.ntu.edu.sg.
I'm on the job market and looking for a Research Scientist/Engineer starting from 2025 summer. Feel free to contact me if you have any opening!
Email /
CV /
Scholar /
Twitter /
Github
|
|
Research
I'm interested in computer vision, and generative AI recently. I also have some experiences in Multi-modal Large Language Model, 3D object detection, prompt learning, and video summarization. .
|
|
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Yucheng Han*,
Rui Wang*,
Chi Zhang*,
Juntao Hu,
Pei Cheng,
Bin Fu,
Hanwang Zhang
arXiv, 2024
project page
/
arXiv
EMMA, a novel model based on ELLA, enhances the capability of multi-modal conditioned image generation by a unique perceiver resampler. It maintains fidelity and detail in generated images, and follows text instructions at the same time, proving an effective solution for diverse multi-modal conditional image generation tasks.
|
|
AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang*,
Zhao Yang*,
Jiaxuan Liu*,
Yucheng Han,
Xin Chen,
Zebiao Huang,
Bin Fu,
Gang Yu
arXiv, 2024
project page
/
arXiv
This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. The framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping.
|
|
ChartLlama: A Multimodal LLM for Chart Understanding and Generation
Yucheng Han*,
Chi Zhang*,
Xin Chen,
Xu Yang,
Zhibin Wang,
Gang Yu,
Bin Fu,
Hanwang Zhang
arXiv, 2024
project page
/
arXiv
This paper propose an instruction-following dataset construction method for chart figures and finetune a LLaVA-1.5-13B to comprehend and generate chart figures.
|
|
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
Yucheng Han,
Na Zhao*,
Weiling Chen,
Keng Teck Ma,
Hanwang Zhang
AAAI, 2024
project page
/
arXiv
A novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. DPKE enriches the knowledge of limited training data, particularly unlabeled data, from data-perspective and feature-perspective.
|
|
Prompt-aligned Gradient for Prompt Tuning
Beier Zhu,
Yulei Niu,
Yucheng Han,
Yue Wu,
Hanwang Zhang
ICCV, 2023
project page
/
arXiv
We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction.
|
|