数据集:Cifar10
模型:Alexnet、Lenet
设置:剪枝率为0.5、0.7
模型 - 剪枝算子 | 测试次数 | Acc | 剪枝率 | 压缩比例 | 推理耗时samples/s |
---|---|---|---|---|---|
Alexnet - 无剪枝 | 5 | 94.89% | - | 1x | 5409 |
Alexnet - bn | 5 | 98.81% | 50% | 1.4x | 5968 |
Alexnet - conv_all | 5 | 93.95% | 50% | 1.3x | 5969 |
Alexnet - conv_avg | 5 | 98.56% | 50% | 1.3x | 5865 |
Alexnet - conv_max | 5 | 97.44% | 50% | 1.3x | 5555 |
Alexnet - random | 5 | 97.32% | 50% | 1.3x | 5580 |
Alexnet -conv_threshold | 5 | 98.03% | 50% | x1.3x | 5567 |
Lenet - 无剪枝 | 5 | 75.72% | - | 1x | 5821 |
Lenet - bn | 5 | 64.89% | 70% | 3x | 1923 |
数据集:SST-2
环境:单卡2080Ti
设置:BERT类模型最大序列长度设为128,LSTM类模型最大序列长度设为32,词表大小为10000
模型 | 测试次数 | Acc | 层数 | 隐藏层维度/前馈层维度 | 模型尺寸 | 压缩比例 | 推理耗时 | 推理加速 |
---|---|---|---|---|---|---|---|---|
BERT_base(Teacher) | 5 | 92.2% | 12 | 768/3072 | 110M | 1x | 4.04s | 1x |
KD | 5 | 80.5% | 3 | 312/1200 | 14.5M | 7.5x | 0.81s | 5.0x |
BiLSTM | 5 | 80.4% | 1 | 300/400 | 15.3M | 7.2x | 0.83s | 4.8x |
Distilled-BiLSTM | 5 | 82.9% | 1 | 300/400 | 15.3M | 7.2x | 0.83s | 4.8x |
BERT-PKD(from scratch) | 5 | 81.5% | 3 | 768/3072 | 45.7M | 2.4x | 1.69s | 2.4x |
BERT-PKD | 5 | 88.4% | 3 | 768/3072 | 45.7M | 2.4x | 1.69s | 2.4x |
TinyBERT | 5 | 91.3% | 4 | 312/1200 | 14.5M | 7.5x | 0.65s | 6.2x |
BERT-of-Theseus | 5 | 87.2% | 4 | 768/3072 | 53.7M | 2.05x | 2.05s | 2.0x |
注:层数不包含embedding和prediction层。