Lin Chen   陈林

PhD Student

School of Automation, University of Science and Technology of China (USTC)

Email: chlin@mail.ustc.edu.cn
Google Scholar: Link
Github: https://github.com/xiaoachen98/
HuggingFace: https://huggingface.co/Lin-Chen

Lin Chen

Biography

Greetings! I'm a Ph.D. candidate in Automation at the School of Automation, University of Science and Technology of China (USTC) (Jan. 2020 - present), advised by Prof. Feng Zhao. I am leading the vision-language model group at USTC-BIVLab. I am also serving as an LVLM research intern in the Seed team at ByteDance.

I'm currently working on large vision-language models, especially focusing on the multimodal reasoning and video understanding capabilities of foundation models. Discussions and cooperations are welcomed! Please feel free to reach out via email or WeChat (xiaoachen98).

✨ NOTE: Our Lab [Link] is looking forward to having elegant students or researchers join us. Positions for Master's, Ph.D., and post-doc are opening. If you are interested in our research and want to join us, just contact me!

News

Experience

Selected Publications

* denotes equal contribution.

Preprint Papers

Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity
Bytedance, Seed, 2026
[PDF] [Project]
Seed1.8 Model Card: Towards Generalized Real-World Agency
Bytedance, Seed, 2025
[PDF] [Github]
Seed1.5-VL Technical Report
Bytedance, Seed, 2025
[PDF] [Code]
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang, Xiaoyi Dong, et al., 2024
[PDF] [Code]

Published Papers

♠ (Co-) First author papers
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
NeurIPS, 2024 — Top 10 Most Influential NeurIPS 2024 Papers
[PDF] [Project] [Code]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
ECCV, 2024 — Top 5 Most Influential ECCV 2024 Papers
[PDF] [Project] [Demo] [Code]
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen*, Xilin Wei*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang
NeurIPS, 2024
[PDF] [Project] [Code]
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Zhixiang Wei*, Lin Chen*, Yi Jin*, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, Jinjin Zheng
CVPR, 2024
[PDF] [Project] [Code]
FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing
Pengyang Ling*, Lin Chen*, Pan Zhang, Huaian Chen, Yi Jin
CVPR, 2024
[PDF] [Project] [Demo] [Code]
Disentangle then Parse: Night-time Semantic Segmentation with Illumination Disentanglement
Zhixiang Wei*, Lin Chen*, Tao Tu, Huaian Chen, Pengyang Ling, Yi Jin
ICCV, 2023
[PDF] [Code]
Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Lin Chen*, Zhixiang Wei*, Xin Jin*, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin
NeurIPS, 2022 — Spotlight
[PDF] [Code]
Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
Lin Chen*, Huaian Chen*, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen
CVPR, 2022
[PDF] [Code]
♠ Co-author papers
CompBench: Benchmarking Complex Instruction-guided Image Editing
CVPR, 2026
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
ICLR, 2026
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction Models
ICLR, 2026
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
NeurIPS, 2025
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
EMNLP Main, 2025
Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
EMNLP Main, 2025
VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping
AAAI, 2025
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NeurIPS, 2024