Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Introduction

@article{wang2021pyramid,
  title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={arXiv preprint arXiv:2102.12122},
  year={2021}
}

@article{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={arXiv preprint arXiv:2106.13797},
  year={2021}
}

Results and Models

RetinaNet (PVTv1)

Backbone	Lr schd	Mem (GB)	box AP	Config	Download
PVT-Tiny	12e	8.5	36.6	config	model \| log
PVT-Small	12e	14.5	40.4	config	model \| log
PVT-Medium	12e	20.9	41.7	config	model \| log

RetinaNet (PVTv2)

Backbone	Lr schd	Mem (GB)	box AP	Config	Download
PVTv2-B0	12e	7.4	37.1	config	model \| log
PVTv2-B1	12e	9.5	41.2	config	model \| log
PVTv2-B2	12e	16.2	44.6	config	model \| log
PVTv2-B3	12e	23.0	46.0	config	model \| log
PVTv2-B4	12e	17.0	46.3	config	model \| log
PVTv2-B5	12e	18.7	46.1	config	model \| log

5.3 kB Raw Blame History

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Introduction

Results and Models

RetinaNet (PVTv1)

RetinaNet (PVTv2)

5.3 kB

Raw Blame History