Lin Chen (陈林)

Greetings! I'm currently a PhD student in School of Automation, University of Science and Technology of China, advised by Prof. Feng Zhao. I got a B.E. degree at Anhui University in 2020 and join the USTC-BIVLab. And I serve as an research intern in Shanghai AI Laboratory now, supervised by Dr. Jiaqi Wang and Dr. Pan Zhang.

My research interest includes:

  • image semantic segmentation
  • domain adaptation/generalization
  • parameter-efficient fine-tuning
  • vision-language models
I sincerely welcome discussions and collaborations. If you're interested, please feel free to reach out to me via email or WeChat (xiaoachen98).

Email  /  Google Scholar  /  Github  /  HuggingFace  /  Twitter

profile photo

[2024.6] We release ShareGPT4Video, comprising 40K GPT4V-generated captions, 4.8M high-quality captions, a general image captioner, and a superior large multi-modal model, ShareGPT4Video-8B

[2024.5] We release Open-LLaVA-Next, an open-source implementation of LLaVA-NeXT series for facilitating the large multi-modal model community. All training data and checkpoints at each stage are open-sourced, friendly for research usage.

[2024.4] We release MMStar, an elite vision-indispensable multi-modal benchmark.

[2024.3] Two papers Rein and FreeDrag were accepted in CVPR 2024!

[2023.12] One paper Point-DETR3D was accepted in AAAI 2024!

[2023.11] 🔥 We release the ShareGPT4V project, comprising 100K GPT4-Vision-generated captions, 1.2M high-quality captions, a general image captioner, and a superior large multi-modal model, ShareGPT4V-7B

[2023.7] DTP is accepted in ICCV 2023 and achieves SOTA in night-time and full-time semantic segmentation!

[2023.7] We release the FreeDrag framework for more superior and stable "drag" editing!

[2022.10] Our DDB receives the Spotlight Award in NeurIPS 2022!

[2022.9] DDB is accepted in NeurIPS 2022 and achieves SOTA with ResNet counterparts on the single-source, multi-source, and multi-target domain-adaptive semantic segmentation tasks!

[2022.3] A discriminator-free adversarial domain adaptation framework DALN is accepted in CVPR 2022!


[2022-07 ~ Now] Research Intern, Open Algorithm group of Shanghai AI Laboratory.

[2022-03 ~ 2022-06] Computer Vision Intern, MMSegmentation team in OpenMMLab group of Shanghai AI Laboratory.


* indicates the equal contribution.

ShareGPT4Video ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen*, Xilin Wei*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang
Arxiv, 2024
[paper] [code] [project page]

A large-scale highly descriptive video-text dataset, with 40K captions annotated by GPT4V and 4.8M captions annotated by our ShareCaptioner-Video. The total videos last with 300 hours and 3000 hours separately!

Reins Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
Arxiv, 2024
[paper] [code] [project page]

We identify two primary issues in existing evaluation studies for large vision-language models. We further develop an elite vision-indispensable multi-modal benchmark and two novel metrics to measure data leakage and actual performance gain in multi-modal training.

Reins Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Zhixiang Wei*, Lin Chen*, Yi Jin*, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, Jinjin Zheng
CVPR, 2024
[paper] [code]

We propose the Rein framework, which efficiently fine-tunes vision foundation models for the domain generalized semantic segmentation (DGSS) task with just 1% trainable parameters, surprisingly surpassing full parameter fine-tuning. And Reins builds a new SOTA in various DGSS benchmarks.

freedrag FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing
Pengyang Ling*, Lin Chen*, Pan Zhang, Huaian Chen, Yi Jin
CVPR, 2024
[paper] [code] [project page] [demo]

We propose a novel "drag" editing framework called FreeDrag free of the burden of erroneous point tracking and enables achieving stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.

point-detr3d Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-Supervised 3D Object Detection
Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, Feng Zhao
AAAI, 2024

A teacher-student framework for weakly semi-supervised 3D detection, designed to fully capitalize on point-wise supervision within a constrained instance-wise annotation budget. With only 5% of labeled data, our Point-DETR3D achieves over 90% performance of its fully supervised counterpart.

sharegpt4v ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
Arxiv, 2023
[project page] [paper] [code] [demo]

We propose the ShareGPT4V project, comprising 100K GPT4-Vision-generated captions, 1.2M high-quality captions, a general image captioner, and a superior large multi-modal model, ShareGPT4V-7B

dtp Disentangle then Parse: Night-time Semantic Segmentation with Illumination Disentanglement
Zhixiang Wei*, Lin Chen*, Tao Tu, Huaian Chen, Pengyang Ling, Yi Jin
ICCV, 2023
[paper] [code]

We propose a novel nigh-time semantic segmentation paradigm, i.e., disentangle then parse (DTP), which explicitly disentangles night-time images into light-invariant reflectance and light-specific illumination components and then recognizes semantics based on their adaptive fusion.

ddb Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Lin Chen*, Zhixiang Wei*, Xin Jin*, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin
NeurIPS, 2022, Spotlight
[paper] [code]

We leverage the complementary characteristics of the coarse-wise and fine-wise data mixing techniques to progressively transfer the knowledge from the source to the target domain.

daln Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
Lin Chen*, Huaian Chen*, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen
CVPR, 2022
[paper] [code]

We reuse the category classifier as a discriminator to form a discriminator-free adversarial learning framework.

  • National Scholarship Award, PRC, 2022.
📝Academic Service (Reviewer)

  • NeurIPS 2023

  • CVPR 2023, 2024

  • ICCV 2023

  • ICLR 2024
  • 🎓Education
    University of Science and Technology of China, Anhui, China
    PhD candidate in Computer Vision (Jan. 2020 to present)
    Anhui University, Anhui, China

    B. Eng in Electronic Information Engineering (2016 to 2020)

    Thanks the original template from jonbarron and the modifications made by shi.