Lin Chen 陈林PhD Student
School of Automation,
University of Science and Technology of China (USTC)
|
Greetings! I'm a Ph.D. candidate in Automation at the School of Automation, University of Science and Technology of China (USTC) (Jan. 2020 - present), advised by Prof. Feng Zhao. I am leading the vision-language model group at USTC-BIVLab. I am also serving as an LVLM research intern in the Seed team at ByteDance.
I'm currently working on large vision-language models, especially focusing on the multimodal reasoning and video understanding capabilities of foundation models. Discussions and cooperations are welcomed! Please feel free to reach out via email or WeChat (xiaoachen98).
✨ NOTE: Our Lab [Link] is looking forward to having elegant students or researchers join us. Positions for Master's, Ph.D., and post-doc are opening. If you are interested in our research and want to join us, just contact me!
* denotes equal contribution.
| Seed2.0 Model Card: Towards Intelligence Frontier for Real-World
Complexity
Bytedance, Seed, 2026 [PDF] [Project] |
| Seed1.8 Model Card: Towards Generalized Real-World Agency
Bytedance, Seed, 2025 [PDF] [Github] |
| Seed1.5-VL Technical Report
Bytedance, Seed, 2025 [PDF] [Code] |
| InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting
Long-Contextual Input and Output
Pan Zhang, Xiaoyi Dong, et al., 2024 [PDF] [Code] |
| ♠ (Co-) First author papers |
| Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao NeurIPS, 2024 — [PDF] [Project] [Code] |
| ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin ECCV, 2024 — [PDF] [Project] [Demo] [Code] |
| ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Lin Chen*, Xilin Wei*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang NeurIPS, 2024 [PDF] [Project] [Code] |
| Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for
Domain Generalized Semantic Segmentation
Zhixiang Wei*, Lin Chen*, Yi Jin*, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, Jinjin Zheng CVPR, 2024 [PDF] [Project] [Code] |
| FreeDrag: Point Tracking is Not What You Need for Interactive Point-based
Image Editing
Pengyang Ling*, Lin Chen*, Pan Zhang, Huaian Chen, Yi Jin CVPR, 2024 [PDF] [Project] [Demo] [Code] |
| Disentangle then Parse: Night-time Semantic Segmentation with Illumination
Disentanglement
Zhixiang Wei*, Lin Chen*, Tao Tu, Huaian Chen, Pengyang Ling, Yi Jin ICCV, 2023 [PDF] [Code] |
| Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Lin Chen*, Zhixiang Wei*, Xin Jin*, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin NeurIPS, 2022 — [PDF] [Code] |
| Reusing the Task-specific Classifier as a Discriminator: Discriminator-free
Adversarial Domain Adaptation
Lin Chen*, Huaian Chen*, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen CVPR, 2022 [PDF] [Code] |
| ♠ Co-author papers |
| CompBench: Benchmarking Complex Instruction-guided Image Editing
CVPR, 2026 |
| Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
ICLR, 2026 |
| V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction Models
ICLR, 2026 |
| VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
NeurIPS, 2025 |
| CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
EMNLP Main, 2025 |
| Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
EMNLP Main, 2025 |
| VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping
AAAI, 2025 |
| Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NeurIPS, 2024 |