Pyramid vision transformer: A versatile backbone for dense prediction without convolutions
Introduction
@article{wang2021pyramid,
title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
journal={arXiv preprint arXiv:2102.12122},
year={2021}
}
@article{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
journal={arXiv preprint arXiv:2106.13797},
year={2021}
}
Results and Models
RetinaNet (PVTv1)
RetinaNet (PVTv2)