Selected Publications
See full list at Google Scholar. (* indicates equal contribution, # indicates corresponding author)
|
|
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo,
Yiyi Zhou,
Tianhe Ren,
Shengxin Chen ,
Xiaoshuai Sun ,
Rongrong Ji#
NeurIPS, 2023
A novel parameter-effective method for enhancing large language models' vision-language capabilities. When applied to a model named LLaMA, the resulting LaVIN demonstrates competitive performance in both single-modality and multi-modality tasks, with significant efficiency and reduced training costs.
|
|
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
Hongyang Li*,
Hao Zhang*,
Zhaoyang Zeng,
Shilong Liu,
Feng Li,
Tianhe Ren,
Lei Zhang#
ICCV, 2023
A new operator named 3D-Deformable-Attention for 2D to 3D feature lifting, which can be used and boost the performance in a range of 3D detection models.
|
|
Detection Transformer with Stable Matching
Shilong Liu*,
Tianhe Ren*,
Jiayu Chen*,
Zhaoyang Zeng,
Hao Zhang,
Feng Li,
Hongyang Li,
Jun Huang,
Hang Su,
Jun Zhu,
Lei Zhang#
ICCV, 2023
Addressed the unstable matching issue in DETR-based models caused by multi-path optimization, by introducing a simple and efficient loss design that uses position metrics to supervise the classification scores of positive examples.
|
|
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Peng Mi,
Li Shen,
Tianhe Ren,
Yiyi Zhou,
Tianshuo Xu ,
Xiaoshuai Sun ,
Rongrong Ji# ,
Dacheng Tao#
ArXiv, 2023
We proposed Sparse SAM (SSAM), an efficient training scheme that improves upon the SharpnessAware Minimization (SAM) by using sparse perturbations via a binary mask, reducing computational overhead. Sparse-SAM is shown to maintain or even enhance performance while being more efficient than SAM, achieving up to 50% sparsity in perturbations.
|
|
detrex: Benchmarking Detection Transformers
Tianhe Ren*,
Shilong Liu*,
Feng Li*,
Hao Zhang*,
Ailing Zeng,
Jie Yang,
Xingyu Liao,
Ding Jia,
Hongyang Li,
He Cao,
Jianan Wang,
Zhaoyang Zeng,
Xianbiao Qi,
Yuhui Yuan,
Jianwei Yang,
Lei Zhang#
ArXiv, 2023
A standardized and unified benchmarking tool for Transformer-based object detection, segmentation, pose estimation and other visual recognition tasks.
|
|
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu,
Zhaoyang Zeng,
Tianhe Ren,
Feng Li,
Hao Zhang,
Jie Yang,
Chunyuan Li,
Jianwei Yang,
Hang Su,
Jun Zhu,
Lei Zhang#
ArXiv, 2023
A simple and strong DETR-based framework for open-set detection, achieving zero-shot 52.5 AP on COCO (training without COCO data).
|
|
You Only Segment Once: Towards Real-Time Panoptic Segmentation
Jie Hu,
Linyan Huang,
Tianhe Ren,
Shengchuan Zhang,
Rongrong Ji ,
Liujuan Cao# ,
CVPR, 2023
A novel framework for real-time panoptic segmentation task with competitive performance compared to state-of-the-art methods.
|
|
Exploring Vision Transformers as Diffusion Learners
He Cao,
Jianan Wang,
Tianhe Ren,
Xianbiao Qi,
Yihao Chen ,
Yuan Yao ,
Lei Zhang#
ArXiv, 2022
A plain, non-hierarchical Vision Transformer (ViT) backbone for diffusion models.
|
|
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Peng Mi,
Li Shen,
Tianhe Ren,
Yiyi Zhou,
Xiaoshuai Sun ,
Rongrong Ji# ,
Dacheng Tao#
NeurIPS, 2022
An efficient variant of SAM optimizer achieved by computing a sparse perturbation based on fisher information and dynamic sparse training.
|
|
TRAR: Routing the Attention Spans in Transformers for Visual Question Answering
Yiyi Zhou* ,
Tianhe Ren*,
Chaoyang Zhu ,
Xiaoshuai Sun# ,
Jianzhuang Liu ,
Xinghao Ding ,
Mingliang Xu ,
Rongrong Ji
ICCV, 2021
A novel dynamic routing attention mechanism brings a consistent performance gain for a range of vision and language tasks.
|
Open Source Projects
* indicates project lead, # indicates directional lead
|
|
Grounded-SAM: Detect, Segment and Generate Anything
Tianhe Ren*,
Shilong Liu*,
He Cao,
Feng Li,
Hao Zhang,
Kunchang Li,
Jiayu Chen ,
Hongyang Li,
Lei Zhang#
ICCV Demo Track, 2023   (Github Trending Top-1 Project)
A strong vision foundation model pipeline by combining Grounding-DINO and Segment-Anything-Model which can detect and segment everything with arbitrary text prompts.
|
|
detrex: research platform for transformer-based instance recognition algorithms.
Tianhe Ren*,
Shilong Liu*,
Hao Zhang*,
Feng Li*,
Xingyu Liao ,
Lei Zhang#
A unified and lightweight research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
|
|
SimREC: light-weight toolbox for referring expression comprehension and segmentation.
Gen Luo*#,
Tianhe Ren*
A simple and efficient toolbox for the research of referring expression comprehension and segmentation,
supporting large-scale pre-training and multi-task learning.
|
|