Tianhe Ren

I am currently working at The International Digital Economy Academy (IDEA) as Computer Vision Engineer, advised by Prof. Lei Zhang . In 2021, I got my bachelor's degree from MAC Lab in Xiamen University advised by associate professor Yiyi Zhou and Prof. Rongrong Ji.

Email  /  Google Scholar  /  Github  /  ZhiHu


  • 2023-09: LaVIN is accepted to NeurIPS 2023.
  • 2023-07: Stable-DINO and DFA3D are accepted to ICCV 2023.
  • 2023-06: Preprint detrex for benchmarking Transformer-based instance recognition algorithms.
  • 2023-05: Grounded-SAM is accepted to present a demo at ICCV 2023 in Paris.
  • 2023-04: Release Grounded-SAM, which aims to detect and segment anything with text inputs.
  • 2023-02: YOSO is accepted to CVPR 2023.
  • 2022-09: SparseSAM is accepted to NeurIPS 2022.
  • 2021-07: TRAR is accepted to ICCV 2021.
  • Selected Publications

    See full list at Google Scholar. (* indicates equal contribution, # indicates corresponding author)

    dise Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
    Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen , Xiaoshuai Sun , Rongrong Ji#
    NeurIPS, 2023

    A novel parameter-effective method for enhancing large language models' vision-language capabilities. When applied to a model named LLaMA, the resulting LaVIN demonstrates competitive performance in both single-modality and multi-modality tasks, with significant efficiency and reduced training costs.

    dise DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
    Hongyang Li*, Hao Zhang*, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang#
    ICCV, 2023

    A new operator named 3D-Deformable-Attention for 2D to 3D feature lifting, which can be used and boost the performance in a range of 3D detection models.

    dise Detection Transformer with Stable Matching
    Shilong Liu*, Tianhe Ren*, Jiayu Chen*, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang#
    ICCV, 2023

    Addressed the unstable matching issue in DETR-based models caused by multi-path optimization, by introducing a simple and efficient loss design that uses position metrics to supervise the classification scores of positive examples.

    dise Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
    Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu , Xiaoshuai Sun , Rongrong Ji# , Dacheng Tao#
    ArXiv, 2023

    We proposed Sparse SAM (SSAM), an efficient training scheme that improves upon the SharpnessAware Minimization (SAM) by using sparse perturbations via a binary mask, reducing computational overhead. Sparse-SAM is shown to maintain or even enhance performance while being more efficient than SAM, achieving up to 50% sparsity in perturbations.

    dise detrex: Benchmarking Detection Transformers
    Tianhe Ren*, Shilong Liu*, Feng Li*, Hao Zhang*, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang#
    ArXiv, 2023

    A standardized and unified benchmarking tool for Transformer-based object detection, segmentation, pose estimation and other visual recognition tasks.

    dise Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang#
    ArXiv, 2023

    A simple and strong DETR-based framework for open-set detection, achieving zero-shot 52.5 AP on COCO (training without COCO data).

    dise You Only Segment Once: Towards Real-Time Panoptic Segmentation
    Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji , Liujuan Cao# ,
    CVPR, 2023

    A novel framework for real-time panoptic segmentation task with competitive performance compared to state-of-the-art methods.

    dise Exploring Vision Transformers as Diffusion Learners
    He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen , Yuan Yao , Lei Zhang#
    ArXiv, 2022

    A plain, non-hierarchical Vision Transformer (ViT) backbone for diffusion models.

    dise Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
    Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun , Rongrong Ji# , Dacheng Tao#
    NeurIPS, 2022

    An efficient variant of SAM optimizer achieved by computing a sparse perturbation based on fisher information and dynamic sparse training.

    dise TRAR: Routing the Attention Spans in Transformers for Visual Question Answering
    Yiyi Zhou* , Tianhe Ren*, Chaoyang Zhu , Xiaoshuai Sun# , Jianzhuang Liu , Xinghao Ding , Mingliang Xu , Rongrong Ji
    ICCV, 2021

    A novel dynamic routing attention mechanism brings a consistent performance gain for a range of vision and language tasks.

    Open Source Projects

    * indicates project lead, # indicates directional lead

    dise Grounded-SAM: Detect, Segment and Generate Anything
    Tianhe Ren*, Shilong Liu*, He Cao, Feng Li, Hao Zhang, Kunchang Li, Jiayu Chen , Hongyang Li, Lei Zhang#
    ICCV Demo Track, 2023   (Github Trending Top-1 Project)

    A strong vision foundation model pipeline by combining Grounding-DINO and Segment-Anything-Model which can detect and segment everything with arbitrary text prompts.

    dise detrex: research platform for transformer-based instance recognition algorithms.
    Tianhe Ren*, Shilong Liu*, Hao Zhang*, Feng Li*, Xingyu Liao , Lei Zhang#

    A unified and lightweight research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

    dise SimREC: light-weight toolbox for referring expression comprehension and segmentation.
    Gen Luo*#, Tianhe Ren*

    A simple and efficient toolbox for the research of referring expression comprehension and segmentation, supporting large-scale pre-training and multi-task learning.

    Website Template