Tianhe Ren

I am currently working at The International Digital Economy Academy (IDEA) as Computer Vision Engineer, advised by Prof. Lei Zhang . In 2021, I got my bachelor's degree from MAC Lab in Xiamen University advised by Associate Prof. Yiyi Zhou and Prof. Rongrong Ji. I'm primarily interested in researching vision foundation models, object detection and segmentation, and multi-modal learning. I'm also passionate about open-source projects in AI community. The research work and open-source projects I'm involved in have garnered almost 20.0K stars on Github.

Email  /  Google Scholar  /  Github  /  ZhiHu


See full list at Google Scholar. (* indicates equal contribution, # indicates corresponding author). Representative papers or projects are highlighted.

dise Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang#
Tech report, Jan. 2024

An overview technical report about our Grounded-SAM project, involving its base pipeline and more applications.

dise LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang*, Hongyang Li*, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao#, Lei Zhang# , Chunyuan Li*, Jianwei Yang*
Tech report, Dec. 2023

LLaVA-Grounding connects Large Multimodal Model (LMM) with a grounding model to facilitate grounded visual chat.

dise Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang#, Lei Zhang#, Jianfeng Gao#,
Computer Vision and Pattern Recognition (CVPR), 2024

DINOv is a visual in-context prompting framework for referring and generic segmentation tasks.

dise T-Rex: Counting by Visual Prompting
Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang#,
Tech report, Nov. 2023

T-Rex is an object counting model that can first detect then count any objects through visual prompting.

dise LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li#
Tech report, Oct. 2023

Extending LLaVA by incorporating a large and diverse set of external tools that can be selected, composed, and activated on the fly for performing tasks.

dise Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji#
Conference on Neural Information Processing Systems (NeurIPS), 2023

A novel parameter-effective method for enhancing large language models' vision-language capabilities. When applied to LLaMA, our LaVIN demonstrates competitive performance in both single-modality and multi-modality tasks, with significant efficiency and reduced training costs.

dise Grounded-SAM: Detect and Segment Anything with Text Prompt
Tianhe Ren*, Shilong Liu*, Kunchang Li, Ailing Zeng, He Cao, Jiayu Chen , Jing Lin, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang#
International Conference on Computer Vision (ICCV) Demo Track, 2023
(Github Trending Top-1 Project)

A strong vision foundation model pipeline by combining Grounding-DINO and Segment-Anything-Model which can detect and segment everything with arbitrary text prompts.

dise DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
Hongyang Li*, Hao Zhang*, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang#
International Conference on Computer Vision (ICCV), 2023

A new operator named 3D-Deformable-Attention for 2D to 3D feature lifting, which can be used and boost the performance in a range of 3D detection models.

dise Detection Transformer with Stable Matching
Shilong Liu*, Tianhe Ren*, Jiayu Chen*, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang#
International Conference on Computer Vision (ICCV), 2023

Addressed the unstable matching issue in DETR-based models caused by multi-path optimization, by introducing a simple and efficient loss design that uses position metrics to supervise the classification scores of positive examples.

dise detrex: Benchmarking Detection Transformers
Tianhe Ren*, Shilong Liu*, Feng Li*, Hao Zhang*, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang#
Tech report, May. 2023

A standardized and unified benchmarking tool for Transformer-based object detection, segmentation, pose estimation and other visual recognition tasks.

dise Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang#
Tech report, Apr. 2023

A simple and strong DETR-based framework for open-set detection, achieving zero-shot 52.5 AP on COCO (training without COCO data).

dise You Only Segment Once: Towards Real-Time Panoptic Segmentation
Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao#,
Computer Vision and Pattern Recognition (CVPR), 2023

A novel framework for real-time panoptic segmentation task with competitive performance compared to state-of-the-art methods.

dise Exploring Vision Transformers as Diffusion Learners
He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen , Yuan Yao# , Lei Zhang#
Tech report, Oct. 2022

A plain, non-hierarchical Vision Transformer (ViT) backbone for diffusion models.

dise Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji#, Dacheng Tao#
Conference on Neural Information Processing Systems (NeurIPS), 2022

An efficient variant of SAM optimizer achieved by computing a sparse perturbation based on fisher information and dynamic sparse training.

dise TRAR: Routing the Attention Spans in Transformers for Visual Question Answering
Yiyi Zhou, Tianhe Ren, Chaoyang Zhu , Xiaoshuai Sun#, Jianzhuang Liu, Xinghao Ding, Mingliang Xu, Rongrong Ji#
International Conference on Computer Vision (ICCV), 2021

A novel dynamic routing attention mechanism brings a consistent performance gain for a range of vision and language tasks.

Amazing template by Jon Barron. Big thanks!