Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Introduction
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Results and models
Mask R-CNN
Backbone |
Pretrain |
Lr schd |
Multi-scale crop |
FP16 |
Mem (GB) |
Inf time (fps) |
box AP |
mask AP |
Config |
Download |
Swin-T |
ImageNet-1K |
1x |
no |
no |
7.6 |
|
42.7 |
39.3 |
config |
model | log |
Swin-T |
ImageNet-1K |
3x |
yes |
no |
10.2 |
|
46.0 |
41.6 |
config |
model | log |
Swin-T |
ImageNet-1K |
3x |
yes |
yes |
7.8 |
|
46.0 |
41.7 |
config |
model | log |
Swin-S |
ImageNet-1K |
3x |
yes |
yes |
11.9 |
|
48.2 |
43.2 |
config |
model | log |