# ResNet

当大家还在惊叹 GoogLeNet 的 Inception 结构的时候，微软亚洲研究院的研究员已经在设计更深但结构更加简单的网络 ResNet，并且凭借这个网络子在 2015 年 ImageNet 比赛上大获全胜。

ResNet 有效地解决了深度神经网络难以训练的问题，可以训练高达 1000 层的卷积网络。网络之所以难以训练，是因为存在着梯度消失的问题，离 loss 函数越远的层，在反向传播的时候，梯度越小，就越难以更新，随着层数的增加，这个现象越严重。之前有两种常见的方案来解决这个问题：

1. 按层训练，先训练比较浅的层，然后在不断增加层数，但是这种方法效果不是特别好，而且比较麻烦
2. 使用更宽的层，或者增加输出通道，而不加深网络的层数，这种结构往往得到的效果又不好

ResNet 通过引入了跨层链接解决了梯度回传消失的问题。

![](images/ResNet_PlainNet.png)

这就普通的网络连接跟跨层残差连接的对比图，使用普通的连接（左图），上层的梯度必须要一层一层传回来；而是用残差连接（右图），相当于中间有了一条更短的路，梯度能够从这条更短的路传回来，避免了梯度过小的情况。

假设某层的输入是 $x$，期望输出是 $H(x)$
* 如果我们直接把输入 $x$ 传到输出作为初始结果，这就是一个更浅层的网络，更容易训练
* 而这个网络没有学习的部分，我们可以使用更深的网络 $F(x)$ 去训练它，使得训练更加容易
* 最后希望拟合的结果就是 $F(x) = H(x) - x$，这就是一个残差的结构



## 1. ResidualBlock

残差网络的结构就是上面这种残差块的堆叠，下面让我们来实现一个 residual block

In [11]:
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision.datasets import CIFAR10
from torchvision import transforms as tfs

In [2]:
def conv3x3(in_channel, out_channel, stride=1):
    return nn.Conv2d(in_channel, out_channel, 3, 
                     stride=stride, padding=1, bias=False)

In [3]:
class Residual_Block(nn.Module):
    def __init__(self, in_channel, out_channel, same_shape=True):
        super(Residual_Block, self).__init__()
        self.same_shape = same_shape
        stride=1 if self.same_shape else 2
        
        self.conv1 = conv3x3(in_channel, out_channel, stride=stride)
        self.bn1 = nn.BatchNorm2d(out_channel)
        
        self.conv2 = conv3x3(out_channel, out_channel)
        self.bn2 = nn.BatchNorm2d(out_channel)
        if not self.same_shape:
            self.conv3 = nn.Conv2d(in_channel, out_channel, 1, 
                                   stride=stride)
        
    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(self.bn1(out), True)
        out = self.conv2(out)
        out = F.relu(self.bn2(out), True)
        
        if not self.same_shape:
            x = self.conv3(x)
        return F.relu(x+out, True)

我们测试一下一个 residual block 的输入和输出

In [4]:
# 输入输出形状相同
test_net = Residual_Block(32, 32)
test_x = Variable(torch.zeros(1, 32, 96, 96))
print('input: {}'.format(test_x.shape))
test_y = test_net(test_x)
print('output: {}'.format(test_y.shape))

input: torch.Size([1, 32, 96, 96])
output: torch.Size([1, 32, 96, 96])


In [5]:
# 输入输出形状不同
test_net = Residual_Block(3, 32, False)
test_x = Variable(torch.zeros(1, 3, 96, 96))
print('input: {}'.format(test_x.shape))
test_y = test_net(test_x)
print('output: {}'.format(test_y.shape))

input: torch.Size([1, 3, 96, 96])
output: torch.Size([1, 32, 48, 48])


一个Residual_Block的结构如下图所示

![resnet-block.png](images/resnet-block.png)

## 2. ResNet的网络实现

下面实现一个 ResNet，它就是 residual block 模块的堆叠

In [6]:
class ResNet(nn.Module):
    def __init__(self, in_channel, num_classes, verbose=False):
        super(ResNet, self).__init__()
        self.verbose = verbose
        
        self.block1 = nn.Conv2d(in_channel, 64, 7, 2)
        
        self.block2 = nn.Sequential(
            nn.MaxPool2d(3, 2),
            Residual_Block(64, 64),
            Residual_Block(64, 64)
        )
        
        self.block3 = nn.Sequential(
            Residual_Block(64, 128, False),
            Residual_Block(128, 128)
        )
        
        self.block4 = nn.Sequential(
            Residual_Block(128, 256, False),
            Residual_Block(256, 256)
        )
        
        self.block5 = nn.Sequential(
            Residual_Block(256, 512, False),
            Residual_Block(512, 512),
            nn.AvgPool2d(3)
        )
        
        self.classifier = nn.Linear(512, num_classes)
        
    def forward(self, x):
        x = self.block1(x)
        if self.verbose:
            print('block 1 output: {}'.format(x.shape))
        x = self.block2(x)
        if self.verbose:
            print('block 2 output: {}'.format(x.shape))
        x = self.block3(x)
        if self.verbose:
            print('block 3 output: {}'.format(x.shape))
        x = self.block4(x)
        if self.verbose:
            print('block 4 output: {}'.format(x.shape))
        x = self.block5(x)
        if self.verbose:
            print('block 5 output: {}'.format(x.shape))
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

输出一下每个 block 之后的大小

In [8]:
test_net = ResNet(3, 10, True)
test_x = Variable(torch.zeros(1, 3, 96, 96))
test_y = test_net(test_x)
print('output: {}'.format(test_y.shape))

block 1 output: torch.Size([1, 64, 45, 45])
block 2 output: torch.Size([1, 64, 22, 22])
block 3 output: torch.Size([1, 128, 11, 11])
block 4 output: torch.Size([1, 256, 6, 6])
block 5 output: torch.Size([1, 512, 1, 1])
output: torch.Size([1, 10])


In [12]:
from utils import train

def data_tf(x):
    im_aug = tfs.Compose([
        tfs.Resize(96),
        tfs.ToTensor(),
        tfs.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ])
    x = im_aug(x)
    return x
     
train_set  = CIFAR10('../../data', train=True,  transform=data_tf)
train_data = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
test_set   = CIFAR10('../../data', train=False, transform=data_tf)
test_data  = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=False)

net = ResNet(3, 10)
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

In [None]:
res = train(net, train_data, test_data, 20, optimizer, criterion)

[ 0] Train:(L=1.506980, Acc=0.449868), Valid:(L=1.119623, Acc=0.598596), T: 00:00:48
[ 1] Train:(L=1.022635, Acc=0.641504), Valid:(L=0.942414, Acc=0.669600), T: 00:00:47
[ 2] Train:(L=0.806174, Acc=0.717551), Valid:(L=0.921687, Acc=0.682061), T: 00:00:47
[ 3] Train:(L=0.638939, Acc=0.775555), Valid:(L=0.802450, Acc=0.729727), T: 00:00:47
[ 4] Train:(L=0.497571, Acc=0.826606), Valid:(L=0.658700, Acc=0.775316), T: 00:00:47
[ 5] Train:(L=0.364864, Acc=0.872442), Valid:(L=0.717290, Acc=0.768888), T: 00:00:47
[ 6] Train:(L=0.263076, Acc=0.907888), Valid:(L=0.832575, Acc=0.750000), T: 00:00:47
[ 7] Train:(L=0.181254, Acc=0.935782), Valid:(L=0.818366, Acc=0.764933), T: 00:00:47
[ 8] Train:(L=0.124111, Acc=0.957820), Valid:(L=0.883527, Acc=0.778184), T: 00:00:47
[ 9] Train:(L=0.108587, Acc=0.961657), Valid:(L=0.899127, Acc=0.780756), T: 00:00:47
[10] Train:(L=0.091386, Acc=0.968670), Valid:(L=0.975022, Acc=0.781448), T: 00:00:47
[11] Train:(L=0.079259, Acc=0.972287), Valid:(L=1.061239, Acc=0.7

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(res[0], label='train')
plt.plot(res[2], label='valid')
plt.xlabel('epoch')
plt.ylabel('Loss')
plt.legend(loc='best')
plt.savefig('fig-res-resnet-train-validate-loss.pdf')
plt.show()

plt.plot(res[1], label='train')
plt.plot(res[3], label='valid')
plt.xlabel('epoch')
plt.ylabel('Acc')
plt.legend(loc='best')
plt.savefig('fig-res-resnet-train-validate-acc.pdf')
plt.show()

# save raw data
import numpy
numpy.save('fig-res-resnet_data.npy', res)

ResNet 使用跨层通道使得训练非常深的卷积神经网络成为可能。同样它使用很简单的卷积层配置，使得其拓展更加简单。



## 练习

* 尝试一下论文中提出的 bottleneck 的结构   
* 尝试改变 conv -> bn -> relu 的顺序为 bn -> relu -> conv，看看精度会不会提高
* 在Residual_Block加入1x1卷积，并尝试结果的差别

## 参考资料
* [Residual Networks (ResNet)](https://d2l.ai/chapter_convolutional-modern/resnet.html)
* [An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035)