ResNet极大地改变了如何参数化深层网络中函数的观点。
稠密连接网络 (DenseNet)[Huang.Liu.Van-Der-Maaten.ea.2017]
在某种程度上是 ResNet 的逻辑扩展。让我们先从数学上了解一下。
回想一下任意函数的泰勒展开式(Taylor expansion),它把这个函数分解成越来越高阶的项。在$x$接近0时,
$$f(x) = f(0) + f'(0) x + \frac{f''(0)}{2!} x^2 + \frac{f'''(0)}{3!} x^3 + \ldots.$$
同样,ResNet 将函数展开为
$$f(\mathbf{x}) = \mathbf{x} + g(\mathbf{x}).$$
也就是说,ResNet 将 $f$ 分解为两部分:一个简单的线性项和一个更复杂的非线性项。
那么再向前拓展一步,如果我们想将 $f$ 拓展成超过两部分的信息呢?
一种方案便是 DenseNet。
如上图所示,ResNet和DenseNet 的关键区别在于,DenseNet输出是连接(用图中的 $[,]$ 表示)而不是如 ResNet 的简单相加。
因此,在应用越来越复杂的函数序列后,我们执行从 $\mathbf{x}$ 到其展开式的映射:
$$\mathbf{x} \to \left[
\mathbf{x},
f_1(\mathbf{x}),
f_2([\mathbf{x}, f_1(\mathbf{x})]), f_3([\mathbf{x}, f_1(\mathbf{x}), f_2([\mathbf{x}, f_1(\mathbf{x})])]), \ldots\right].$$
最后,将这些展开式结合到多层感知机中,再次减少特征的数量。
实现起来非常简单:我们不需要添加术语,而是将它们连接起来。
DenseNet 这个名字由变量之间的“稠密连接”而得来,最后一层与之前的所有层紧密相连。
稠密连接如上图所示。
稠密网络主要由 2 部分构成: 稠密块(dense block)和 过渡层 (transition layer)。
前者定义如何连接输入和输出,而后者则控制通道数量,使其不会太复杂。
DenseNet 使用了 ResNet 改良版的“批量归一化、激活和卷积”结构。
我们首先实现一下这个结构。
import tensorflow as tf
import tensorlayer as tl
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
Using TensorFlow backend.
class BottleNeck(tl.layers.Module):
def __init__(self, growth_rate, drop_rate):
super(BottleNeck, self).__init__()
self.bn1 = tl.layers.BatchNorm()
self.conv1 = tl.layers.Conv2d(n_filter=4 * growth_rate,
filter_size=(1, 1),
strides=(1,1),
padding="SAME")
self.bn2 = tl.layers.BatchNorm()
self.conv2 = tl.layers.Conv2d(n_filter=growth_rate,
filter_size=(3, 3),
strides=(1,1),
padding="SAME")
self.dropout = tl.layers.Dropout(keep=drop_rate)
self.listLayers = [self.bn1,
tl.layers.PRelu(channel_shared=True),
self.conv1,
self.bn2,
tl.layers.PRelu(channel_shared=True),
self.conv2,
self.dropout]
def forward(self, x):
y = x
for layer in self.listLayers:
y = layer(y)
y = tf.keras.layers.concatenate([x, y], axis=-1)
return y
一个稠密块由多个卷积块组成,每个卷积块使用相同数量的输出信道。
然而,在前向传播中,我们将每个卷积块的输入和输出在通道维上连结。
class DenseBlock(tl.layers.Module):
def __init__(self, num_layers, growth_rate, drop_rate=0.5):
super(DenseBlock, self).__init__()
self.num_layers = num_layers
self.growth_rate = growth_rate
self.drop_rate = drop_rate
self.listLayers = []
for _ in range(num_layers):
self.listLayers.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))
def forward(self, x):
for layer in self.listLayers:
x = layer(x)
return x
在下面的例子中,我们[定义一个]有 2 个输出通道数为 10 的 (DenseBlock
)。
使用通道数为 3 的输入时,我们会得到通道数为 $3+2\times 10=23$ 的输出。
卷积块的通道数控制了输出通道数相对于输入通道数的增长,因此也被称为增长率(growth rate)。
blk = DenseBlock(2, 10)
X = tf.random.uniform((4, 8, 8, 3))
Y = blk(X)
Y.shape
[TL] BatchNorm batchnorm_9: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_9: n_filter: 40 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_10: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_10: n_filter: 10 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_3: keep: 0.500000
[TL] PRelu prelu_9: channel_shared: True
[TL] PRelu prelu_10: channel_shared: True
[TL] BatchNorm batchnorm_11: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_11: n_filter: 40 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_12: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_12: n_filter: 10 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_4: keep: 0.500000
[TL] PRelu prelu_11: channel_shared: True
[TL] PRelu prelu_12: channel_shared: True
TensorShape([4, 8, 8, 23])
由于每个稠密块都会带来通道数的增加,使用过多则会过于复杂化模型。
而过渡层可以用来控制模型复杂度。
它通过 $1\times 1$ 卷积层来减小通道数,并使用步幅为 2 的平均池化层减半高和宽,从而进一步降低模型复杂度。
class TransitionLayer(tl.layers.Module):
def __init__(self, out_channels):
super(TransitionLayer, self).__init__()
self.bn = tl.layers.BatchNorm()
self.conv = tl.layers.Conv2d(n_filter=out_channels,
filter_size=(1, 1),
strides=(1,1),
padding="same")
self.pool = tl.layers.MaxPool2d(filter_size=(2, 2),
strides=(2,2),
padding="SAME")
def forward(self, inputs):
x = self.bn(inputs)
x = tl.relu(x)
x = self.conv(x)
x = self.pool(x)
return x
对上一个例子中稠密块的输出[使用]通道数为 10 的[过渡层]。
此时输出的通道数减为 10,高和宽均减半。
blk = TransitionLayer(10)
blk(Y).shape
[TL] BatchNorm batchnorm_13: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_13: n_filter: 10 filter_size: (1, 1) strides: (1, 1) pad: same act: No Activation
[TL] MaxPool2d maxpool2d_1: filter_size: (2, 2) strides: (2, 2) padding: SAME
TensorShape([4, 4, 4, 10])
我们来构造 DenseNet 模型。DenseNet 首先使用同 ResNet 一样的单卷积层和最大池化层。
接下来,类似于ResNet使用的4个残差块,DenseNet使用的是4个稠密块。
与 ResNet 类似,我们可以设置每个稠密块使用多少个卷积层。
这里我们设成 4,从而与7.6节
的ResNet-18保持一致。
稠密块里的卷积层通道数(即增长率)设为32,所以每个稠密块将增加128个通道。
在每个模块之间,ResNet通过步幅为2的残差块减小高和宽,DenseNet则使用过渡层来减半高和宽,并减半通道数。
最后接上全局池化层和全连接层来输出结果。
class DenseNet(tl.layers.Module):
def __init__(self, num_init_features, growth_rate, block_layers, compression_rate, drop_rate):
super(DenseNet, self).__init__()
self.conv = tl.layers.Conv2d(n_filter=num_init_features,
filter_size=(7, 7),
strides=(2,2),
padding="SAME")
self.bn = tl.layers.BatchNorm()
self.pool = tl.layers.MaxPool2d(filter_size=(3, 3),
strides=(2,2),
padding="SAME")
self.num_channels = num_init_features
self.dense_block_1 = DenseBlock(num_layers=block_layers[0], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[0]
self.num_channels = compression_rate * self.num_channels
self.transition_1 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_2 = DenseBlock(num_layers=block_layers[1], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[1]
self.num_channels = compression_rate * self.num_channels
self.transition_2 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_3 = DenseBlock(num_layers=block_layers[2], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[2]
self.num_channels = compression_rate * self.num_channels
self.transition_3 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_4 = DenseBlock(num_layers=block_layers[3], growth_rate=growth_rate, drop_rate=drop_rate)
self.avgpool = tl.layers.GlobalMeanPool2d()
self.fc = tl.layers.Dense(n_units=10,act=tl.softmax(logits=()))
def forward(self, inputs):
x = self.conv(inputs)
x = self.bn(x)
x = tl.relu(x)
x = self.pool(x)
x = self.dense_block_1(x)
x = self.transition_1(x)
x = self.dense_block_2(x)
x = self.transition_2(x)
x = self.dense_block_3(x)
x = self.transition_3(x,)
x = self.dense_block_4(x)
x = self.avgpool(x)
x = self.fc(x)
return x
构建在3个密集连接块上
class DenseNet_100(tl.layers.Module):
def __init__(self, num_init_features, growth_rate, block_layers, compression_rate, drop_rate):
super(DenseNet_100, self).__init__()
self.conv = tl.layers.Conv2d(n_filter=num_init_features,
filter_size=(7, 7),
strides=(2,2),
padding="SAME")
self.bn = tl.layers.BatchNorm()
self.pool = tl.layers.MaxPool2d(filter_size=(3, 3),
strides=(2,2),
padding="SAME")
self.num_channels = num_init_features
self.dense_block_1 = DenseBlock(num_layers=block_layers[0], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[0]
self.num_channels = compression_rate * self.num_channels
self.transition_1 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_2 = DenseBlock(num_layers=block_layers[1], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[1]
self.num_channels = compression_rate * self.num_channels
self.transition_2 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_3 = DenseBlock(num_layers=block_layers[2], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[2]
self.num_channels = compression_rate * self.num_channels
self.transition_3 = TransitionLayer(out_channels=int(self.num_channels))
self.avgpool = tl.layers.GlobalMeanPool2d()
self.fc = tl.layers.Dense(n_units=10,act=tl.softmax(logits=()))
def forward(self, inputs):
x = self.conv(inputs)
x = self.bn(x)
x = tl.relu(x)
x = self.pool(x)
x = self.dense_block_1(x)
x = self.transition_1(x)
x = self.dense_block_2(x)
x = self.transition_2(x)
x = self.dense_block_3(x)
x = self.transition_3(x,)
x = self.avgpool(x)
# x = tl.layers.Dense(n_units=10,act=tl.softmax(logits=x))
x = self.fc(x)
return x
def densenet(x):
if x == 'densenet-121':
return DenseNet(num_init_features=64, growth_rate=32, block_layers=[6, 12, 24, 16], compression_rate=0.5,
drop_rate=0.5)
elif x == 'densenet-169':
return DenseNet(num_init_features=64, growth_rate=32, block_layers=[6 , 12, 32, 32], compression_rate=0.5,
drop_rate=0.5)
elif x == 'densenet-201':
return DenseNet(num_init_features=64, growth_rate=32, block_layers=[6, 12, 48, 32], compression_rate=0.5,
drop_rate=0.5)
elif x == 'densenet-264':
return DenseNet(num_init_features=64, growth_rate=32, block_layers=[6, 12, 64, 48], compression_rate=0.5,
drop_rate=0.5)
elif x=='densenet-100':
return DenseNet_100(num_init_features=64, growth_rate=12, block_layers=[16, 16, 16], compression_rate=0.5,
drop_rate=0.5)
直接用原项目提供的训练代码
import time
import multiprocessing
import tensorflow as tf
import os
os.environ['TL_BACKEND'] = 'tensorflow'
import tensorlayer as tl
from DenseNet.DenseNet_tensorlayer import densenet
tl.logging.set_verbosity(tl.logging.DEBUG)
X_train, y_train, X_test, y_test = tl.files.load_cifar10_dataset(shape=(-1, 32, 32, 3), plotable=False)
# get the network
net = densenet("densenet-100")
# training settings
batch_size = 128
n_epoch = 500
learning_rate = 0.0001
print_freq = 5
n_step_epoch = int(len(y_train) / batch_size)
n_step = n_epoch * n_step_epoch
shuffle_buffer_size = 128
train_weights = net.trainable_weights
optimizer = tl.optimizers.Adam(learning_rate)
metrics = tl.metric.Accuracy()
def generator_train():
inputs = X_train
targets = y_train
if len(inputs) != len(targets):
raise AssertionError("The length of inputs and targets should be equal")
for _input, _target in zip(inputs, targets):
# yield _input.encode('utf-8'), _target.encode('utf-8')
yield _input, _target
def generator_test():
inputs = X_test
targets = y_test
if len(inputs) != len(targets):
raise AssertionError("The length of inputs and targets should be equal")
for _input, _target in zip(inputs, targets):
# yield _input.encode('utf-8'), _target.encode('utf-8')
yield _input, _target
def _map_fn_train(img, target):
# 1. Randomly crop a [height, width] section of the image.
img = tf.image.random_crop(img, [24, 24, 3])
# 2. Randomly flip the image horizontally.
img = tf.image.random_flip_left_right(img)
# 3. Randomly change brightness.
img = tf.image.random_brightness(img, max_delta=63)
# 4. Randomly change contrast.
img = tf.image.random_contrast(img, lower=0.2, upper=1.8)
# 5. Subtract off the mean and divide by the variance of the pixels.
img = tf.image.per_image_standardization(img)
target = tf.reshape(target, ())
return img, target
def _map_fn_test(img, target):
# 1. Crop the central [height, width] of the image.
img = tf.image.resize_with_pad(img, 24, 24)
# 2. Subtract off the mean and divide by the variance of the pixels.
img = tf.image.per_image_standardization(img)
img = tf.reshape(img, (24, 24, 3))
target = tf.reshape(target, ())
return img, target
# dataset API and augmentation
train_ds = tf.data.Dataset.from_generator(
generator_train, output_types=(tf.float32, tf.int32)
) # , output_shapes=((24, 24, 3), (1)))
train_ds = train_ds.map(_map_fn_train,num_parallel_calls=multiprocessing.cpu_count())
# train_ds = train_ds.repeat(n_epoch)
train_ds = train_ds.shuffle(shuffle_buffer_size)
train_ds = train_ds.prefetch(buffer_size=4096)
train_ds = train_ds.batch(batch_size)
# value = train_ds.make_one_shot_iterator().get_next()
test_ds = tf.data.Dataset.from_generator(
generator_test, output_types=(tf.float32, tf.int32)
) # , output_shapes=((24, 24, 3), (1)))
# test_ds = test_ds.shuffle(shuffle_buffer_size)
test_ds = test_ds.map(_map_fn_test,num_parallel_calls=multiprocessing.cpu_count())
# test_ds = test_ds.repeat(n_epoch)
test_ds = test_ds.prefetch(buffer_size=4096)
test_ds = test_ds.batch(batch_size)
# value_test = test_ds.make_one_shot_iterator().get_next()
class WithLoss(tl.layers.Module):
def __init__(self, net, loss_fn):
super(WithLoss, self).__init__()
self._net = net
self._loss_fn = loss_fn
def forward(self, data, label):
out = self._net(data)
loss = self._loss_fn(out, label)
return loss
net_with_loss = WithLoss(net, loss_fn=tl.cost.softmax_cross_entropy_with_logits)
net_with_train = tl.models.TrainOneStep(net_with_loss, optimizer, train_weights)
for epoch in range(n_epoch):
start_time = time.time()
net.set_train()
train_loss, train_acc, n_iter = 0, 0, 0
for X_batch, y_batch in train_ds:
X_batch = tl.ops.convert_to_tensor(X_batch.numpy(), dtype=tl.float32)
y_batch = tl.ops.convert_to_tensor(y_batch.numpy(), dtype=tl.int64)
_loss_ce = net_with_train(X_batch, y_batch)
train_loss += _loss_ce
n_iter += 1
_logits = net(X_batch)
metrics.update(_logits, y_batch)
train_acc += metrics.result()
metrics.reset()
print("Epoch {} of {} took {}".format(epoch + 1, n_epoch, time.time() - start_time))
print(" train loss: {}".format(train_loss / n_iter))
print(" train acc: {}".format(train_acc / n_iter))
[TL] Load or Download cifar10 > data\cifar10
[TL] Conv2d conv2d_134: n_filter: 64 filter_size: (7, 7) strides: (2, 2) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_134: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] MaxPool2d maxpool2d_6: filter_size: (3, 3) strides: (2, 2) padding: SAME
[TL] BatchNorm batchnorm_135: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_135: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_136: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_136: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_63: keep: 0.500000
[TL] PRelu prelu_129: channel_shared: True
[TL] PRelu prelu_130: channel_shared: True
[TL] BatchNorm batchnorm_137: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_137: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_138: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_138: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_64: keep: 0.500000
[TL] PRelu prelu_131: channel_shared: True
[TL] PRelu prelu_132: channel_shared: True
[TL] BatchNorm batchnorm_139: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_139: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_140: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_140: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_65: keep: 0.500000
[TL] PRelu prelu_133: channel_shared: True
[TL] PRelu prelu_134: channel_shared: True
[TL] BatchNorm batchnorm_141: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_141: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_142: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_142: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_66: keep: 0.500000
[TL] PRelu prelu_135: channel_shared: True
[TL] PRelu prelu_136: channel_shared: True
[TL] BatchNorm batchnorm_143: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_143: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_144: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_144: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_67: keep: 0.500000
[TL] PRelu prelu_137: channel_shared: True
[TL] PRelu prelu_138: channel_shared: True
[TL] BatchNorm batchnorm_145: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_145: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_146: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_146: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_68: keep: 0.500000
[TL] PRelu prelu_139: channel_shared: True
[TL] PRelu prelu_140: channel_shared: True
[TL] BatchNorm batchnorm_147: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_147: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_148: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_148: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_69: keep: 0.500000
[TL] PRelu prelu_141: channel_shared: True
[TL] PRelu prelu_142: channel_shared: True
[TL] BatchNorm batchnorm_149: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_149: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_150: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_150: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_70: keep: 0.500000
[TL] PRelu prelu_143: channel_shared: True
[TL] PRelu prelu_144: channel_shared: True
[TL] BatchNorm batchnorm_151: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_151: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_152: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_152: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_71: keep: 0.500000
[TL] PRelu prelu_145: channel_shared: True
[TL] PRelu prelu_146: channel_shared: True
[TL] BatchNorm batchnorm_153: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_153: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_154: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_154: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_72: keep: 0.500000
[TL] PRelu prelu_147: channel_shared: True
[TL] PRelu prelu_148: channel_shared: True
[TL] BatchNorm batchnorm_155: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_155: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_156: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_156: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_73: keep: 0.500000
[TL] PRelu prelu_149: channel_shared: True
[TL] PRelu prelu_150: channel_shared: True
[TL] BatchNorm batchnorm_157: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_157: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_158: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_158: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_74: keep: 0.500000
[TL] PRelu prelu_151: channel_shared: True
[TL] PRelu prelu_152: channel_shared: True
[TL] BatchNorm batchnorm_159: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_159: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_160: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_160: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_75: keep: 0.500000
[TL] PRelu prelu_153: channel_shared: True
[TL] PRelu prelu_154: channel_shared: True
[TL] BatchNorm batchnorm_161: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_161: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_162: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_162: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_76: keep: 0.500000
[TL] PRelu prelu_155: channel_shared: True
[TL] PRelu prelu_156: channel_shared: True
[TL] BatchNorm batchnorm_163: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_163: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_164: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_164: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_77: keep: 0.500000
[TL] PRelu prelu_157: channel_shared: True
[TL] PRelu prelu_158: channel_shared: True
[TL] BatchNorm batchnorm_165: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_165: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_166: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_166: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_78: keep: 0.500000
[TL] PRelu prelu_159: channel_shared: True
[TL] PRelu prelu_160: channel_shared: True
[TL] BatchNorm batchnorm_167: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_167: n_filter: 128 filter_size: (1, 1) strides: (1, 1) pad: same act: No Activation
[TL] MaxPool2d maxpool2d_7: filter_size: (2, 2) strides: (2, 2) padding: SAME
[TL] BatchNorm batchnorm_168: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_168: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_169: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_169: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_79: keep: 0.500000
[TL] PRelu prelu_161: channel_shared: True
[TL] PRelu prelu_162: channel_shared: True
[TL] BatchNorm batchnorm_170: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_170: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_171: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_171: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_80: keep: 0.500000
[TL] PRelu prelu_163: channel_shared: True
[TL] PRelu prelu_164: channel_shared: True
[TL] BatchNorm batchnorm_172: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_172: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_173: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_173: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_81: keep: 0.500000
[TL] PRelu prelu_165: channel_shared: True
[TL] PRelu prelu_166: channel_shared: True
[TL] BatchNorm batchnorm_174: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_174: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_175: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_175: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_82: keep: 0.500000
[TL] PRelu prelu_167: channel_shared: True
[TL] PRelu prelu_168: channel_shared: True
[TL] BatchNorm batchnorm_176: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_176: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_177: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_177: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_83: keep: 0.500000
[TL] PRelu prelu_169: channel_shared: True
[TL] PRelu prelu_170: channel_shared: True
[TL] BatchNorm batchnorm_178: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_178: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_179: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_179: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_84: keep: 0.500000
[TL] PRelu prelu_171: channel_shared: True
[TL] PRelu prelu_172: channel_shared: True
[TL] BatchNorm batchnorm_180: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_180: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_181: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_181: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_85: keep: 0.500000
[TL] PRelu prelu_173: channel_shared: True
[TL] PRelu prelu_174: channel_shared: True
[TL] BatchNorm batchnorm_182: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_182: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_183: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_183: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_86: keep: 0.500000
[TL] PRelu prelu_175: channel_shared: True
[TL] PRelu prelu_176: channel_shared: True
[TL] BatchNorm batchnorm_184: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_184: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_185: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_185: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_87: keep: 0.500000
[TL] PRelu prelu_177: channel_shared: True
[TL] PRelu prelu_178: channel_shared: True
[TL] BatchNorm batchnorm_186: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_186: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_187: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_187: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_88: keep: 0.500000
[TL] PRelu prelu_179: channel_shared: True
[TL] PRelu prelu_180: channel_shared: True
[TL] BatchNorm batchnorm_188: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_188: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_189: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_189: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_89: keep: 0.500000
[TL] PRelu prelu_181: channel_shared: True
[TL] PRelu prelu_182: channel_shared: True
[TL] BatchNorm batchnorm_190: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_190: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_191: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_191: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_90: keep: 0.500000
[TL] PRelu prelu_183: channel_shared: True
[TL] PRelu prelu_184: channel_shared: True
[TL] BatchNorm batchnorm_192: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_192: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_193: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_193: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_91: keep: 0.500000
[TL] PRelu prelu_185: channel_shared: True
[TL] PRelu prelu_186: channel_shared: True
[TL] BatchNorm batchnorm_194: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_194: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_195: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_195: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_92: keep: 0.500000
[TL] PRelu prelu_187: channel_shared: True
[TL] PRelu prelu_188: channel_shared: True
[TL] BatchNorm batchnorm_196: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_196: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_197: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_197: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_93: keep: 0.500000
[TL] PRelu prelu_189: channel_shared: True
[TL] PRelu prelu_190: channel_shared: True
[TL] BatchNorm batchnorm_198: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_198: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_199: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_199: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_94: keep: 0.500000
[TL] PRelu prelu_191: channel_shared: True
[TL] PRelu prelu_192: channel_shared: True
[TL] BatchNorm batchnorm_200: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_200: n_filter: 160 filter_size: (1, 1) strides: (1, 1) pad: same act: No Activation
[TL] MaxPool2d maxpool2d_8: filter_size: (2, 2) strides: (2, 2) padding: SAME
[TL] BatchNorm batchnorm_201: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_201: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_202: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_202: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_95: keep: 0.500000
[TL] PRelu prelu_193: channel_shared: True
[TL] PRelu prelu_194: channel_shared: True
[TL] BatchNorm batchnorm_203: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_203: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_204: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_204: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_96: keep: 0.500000
[TL] PRelu prelu_195: channel_shared: True
[TL] PRelu prelu_196: channel_shared: True
[TL] BatchNorm batchnorm_205: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_205: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_206: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_206: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_97: keep: 0.500000
[TL] PRelu prelu_197: channel_shared: True
[TL] PRelu prelu_198: channel_shared: True
[TL] BatchNorm batchnorm_207: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_207: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_208: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_208: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_98: keep: 0.500000
[TL] PRelu prelu_199: channel_shared: True
[TL] PRelu prelu_200: channel_shared: True
[TL] BatchNorm batchnorm_209: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_209: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_210: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_210: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_99: keep: 0.500000
[TL] PRelu prelu_201: channel_shared: True
[TL] PRelu prelu_202: channel_shared: True
[TL] BatchNorm batchnorm_211: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_211: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_212: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_212: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_100: keep: 0.500000
[TL] PRelu prelu_203: channel_shared: True
[TL] PRelu prelu_204: channel_shared: True
[TL] BatchNorm batchnorm_213: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_213: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_214: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_214: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_101: keep: 0.500000
[TL] PRelu prelu_205: channel_shared: True
[TL] PRelu prelu_206: channel_shared: True
[TL] BatchNorm batchnorm_215: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_215: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_216: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_216: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_102: keep: 0.500000
[TL] PRelu prelu_207: channel_shared: True
[TL] PRelu prelu_208: channel_shared: True
[TL] BatchNorm batchnorm_217: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_217: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_218: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_218: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_103: keep: 0.500000
[TL] PRelu prelu_209: channel_shared: True
[TL] PRelu prelu_210: channel_shared: True
[TL] BatchNorm batchnorm_219: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_219: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_220: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_220: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_104: keep: 0.500000
[TL] PRelu prelu_211: channel_shared: True
[TL] PRelu prelu_212: channel_shared: True
[TL] BatchNorm batchnorm_221: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_221: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_222: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_222: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_105: keep: 0.500000
[TL] PRelu prelu_213: channel_shared: True
[TL] PRelu prelu_214: channel_shared: True
[TL] BatchNorm batchnorm_223: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_223: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_224: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_224: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_106: keep: 0.500000
[TL] PRelu prelu_215: channel_shared: True
[TL] PRelu prelu_216: channel_shared: True
[TL] BatchNorm batchnorm_225: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_225: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_226: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_226: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_107: keep: 0.500000
[TL] PRelu prelu_217: channel_shared: True
[TL] PRelu prelu_218: channel_shared: True
[TL] BatchNorm batchnorm_227: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_227: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_228: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_228: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_108: keep: 0.500000
[TL] PRelu prelu_219: channel_shared: True
[TL] PRelu prelu_220: channel_shared: True
[TL] BatchNorm batchnorm_229: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_229: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_230: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_230: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_109: keep: 0.500000
[TL] PRelu prelu_221: channel_shared: True
[TL] PRelu prelu_222: channel_shared: True
[TL] BatchNorm batchnorm_231: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_231: n_filter: 48 filter_size: (1, 1) strides: (1, 1) pad: SAME act: No Activation
[TL] BatchNorm batchnorm_232: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_232: n_filter: 12 filter_size: (3, 3) strides: (1, 1) pad: SAME act: No Activation
[TL] Dropout dropout_110: keep: 0.500000
[TL] PRelu prelu_223: channel_shared: True
[TL] PRelu prelu_224: channel_shared: True
[TL] BatchNorm batchnorm_233: decay: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TL] Conv2d conv2d_233: n_filter: 176 filter_size: (1, 1) strides: (1, 1) pad: same act: No Activation
[TL] MaxPool2d maxpool2d_9: filter_size: (2, 2) strides: (2, 2) padding: SAME
[TL] GlobalMeanPool2d globalmeanpool2d_2
[TL] Dense dense_2: 10 EagerTensor
Epoch 1 of 500 took 2.472132682800293
train loss: 2.3026280403137207
train acc: 0.0859375
Epoch 1 of 500 took 3.1664297580718994
train loss: 2.302629232406616
train acc: 0.06640625
Epoch 1 of 500 took 3.8870487213134766
train loss: 2.302507162094116
train acc: 0.0755208358168602
Epoch 1 of 500 took 4.617677927017212
train loss: 2.3025760650634766
train acc: 0.087890625
Epoch 1 of 500 took 5.326868057250977
train loss: 2.3025622367858887
train acc: 0.09218750149011612
Epoch 1 of 500 took 6.068540573120117
train loss: 2.3026044368743896
train acc: 0.08984375
Epoch 1 of 500 took 6.782266139984131
train loss: 2.3025448322296143
train acc: 0.0881696417927742
Epoch 1 of 500 took 7.485439300537109
train loss: 2.3025074005126953
train acc: 0.0869140625
Epoch 1 of 500 took 8.213810443878174
train loss: 2.302518844604492
train acc: 0.0902777761220932
Epoch 1 of 500 took 8.954435586929321
train loss: 2.302535057067871
train acc: 0.08906249701976776
Epoch 1 of 500 took 9.724130630493164
train loss: 2.302517890930176
train acc: 0.09375
Epoch 1 of 500 took 10.50143551826477
train loss: 2.302520751953125
train acc: 0.095703125
Epoch 1 of 500 took 11.254729986190796
train loss: 2.302520751953125
train acc: 0.09555288404226303
Epoch 1 of 500 took 12.040328741073608
train loss: 2.302544355392456
train acc: 0.0959821417927742
Epoch 1 of 500 took 12.847611427307129
train loss: 2.302537679672241
train acc: 0.09687499701976776
Epoch 1 of 500 took 13.565277814865112
train loss: 2.302561044692993
train acc: 0.1005859375
Epoch 1 of 500 took 14.284465312957764
train loss: 2.302546977996826
train acc: 0.0992647036910057
Epoch 1 of 500 took 14.988042116165161
train loss: 2.30257511138916
train acc: 0.1002604141831398
Epoch 1 of 500 took 15.723733901977539
train loss: 2.3025801181793213
train acc: 0.10115131735801697
Epoch 1 of 500 took 16.431028127670288
train loss: 2.3025593757629395
train acc: 0.10234375298023224
Epoch 1 of 500 took 17.1357638835907
train loss: 2.302570343017578
train acc: 0.103050597012043
Epoch 1 of 500 took 17.839925050735474
train loss: 2.3025553226470947
train acc: 0.10475852340459824
Epoch 1 of 500 took 18.530513525009155
train loss: 2.30254864692688
train acc: 0.10529891401529312
Epoch 1 of 500 took 19.220801830291748
train loss: 2.302539825439453
train acc: 0.1048177108168602
Epoch 1 of 500 took 19.934443950653076
train loss: 2.30253267288208
train acc: 0.10343749821186066
Epoch 1 of 500 took 20.631637573242188
train loss: 2.3025319576263428
train acc: 0.10306490212678909
Epoch 1 of 500 took 21.35742998123169
train loss: 2.302518129348755
train acc: 0.10387731343507767
Epoch 1 of 500 took 22.061012983322144
train loss: 2.302506923675537
train acc: 0.1040736585855484
Epoch 1 of 500 took 22.77148962020874
train loss: 2.302518129348755
train acc: 0.10317888110876083
Epoch 1 of 500 took 23.499772548675537
train loss: 2.302516222000122
train acc: 0.10260416567325592
Epoch 1 of 500 took 24.261035680770874
train loss: 2.302506446838379
train acc: 0.1020665317773819
Epoch 1 of 500 took 25.001662254333496
train loss: 2.3024935722351074
train acc: 0.10302734375
Epoch 1 of 500 took 25.749852180480957
train loss: 2.302504539489746
train acc: 0.10321969538927078
Epoch 1 of 500 took 26.513017654418945
train loss: 2.302511215209961
train acc: 0.10363051295280457
Epoch 1 of 500 took 27.220292568206787
train loss: 2.3025102615356445
train acc: 0.10312499850988388
Epoch 1 of 500 took 28.074881076812744
train loss: 2.302503824234009
train acc: 0.1028645858168602
Epoch 1 of 500 took 28.793709993362427
train loss: 2.302511215209961
train acc: 0.1034628376364708
Epoch 1 of 500 took 29.56246781349182
train loss: 2.302523136138916
train acc: 0.10259046405553818
Epoch 1 of 500 took 30.309444665908813
train loss: 2.302518129348755
train acc: 0.10316506773233414
Epoch 1 of 500 took 31.03504705429077
train loss: 2.3025131225585938
train acc: 0.10332031548023224
Epoch 1 of 500 took 31.74468207359314
train loss: 2.3025143146514893
train acc: 0.10251524299383163
Epoch 1 of 500 took 32.44689440727234
train loss: 2.3025083541870117
train acc: 0.102492555975914
Epoch 1 of 500 took 33.181196451187134
train loss: 2.3025104999542236
train acc: 0.10247092694044113
Epoch 1 of 500 took 33.900819301605225
train loss: 2.3025102615356445
train acc: 0.1015625
Epoch 1 of 500 took 34.656460762023926
train loss: 2.302510976791382
train acc: 0.10086805373430252
Epoch 1 of 500 took 35.36271095275879
train loss: 2.3025131225585938
train acc: 0.10071331262588501
Epoch 1 of 500 took 36.06831693649292
train loss: 2.30250883102417
train acc: 0.10073138028383255
Epoch 1 of 500 took 36.776965618133545
train loss: 2.3025097846984863
train acc: 0.10009765625
Epoch 1 of 500 took 37.47527098655701
train loss: 2.302520990371704
train acc: 0.09948979318141937
Epoch 1 of 500 took 38.20156741142273
train loss: 2.3025150299072266
train acc: 0.10000000149011612
Epoch 1 of 500 took 38.94017839431763
train loss: 2.3025131225585938
train acc: 0.10018382221460342
Epoch 1 of 500 took 39.68779230117798
train loss: 2.3025169372558594
train acc: 0.10036057978868484
Epoch 1 of 500 took 40.50718331336975
train loss: 2.302532434463501
train acc: 0.09979363530874252
Epoch 1 of 500 took 41.23833727836609
train loss: 2.302534818649292
train acc: 0.0998263880610466
Epoch 1 of 500 took 42.00189256668091
train loss: 2.3025405406951904
train acc: 0.09985795617103577
Epoch 1 of 500 took 42.86620116233826
train loss: 2.3025457859039307
train acc: 0.1001674085855484
Epoch 1 of 500 took 43.598840951919556
train loss: 2.3025429248809814
train acc: 0.10005482286214828
Epoch 1 of 500 took 44.333006620407104
train loss: 2.302553415298462
train acc: 0.09940733015537262
Epoch 1 of 500 took 45.103665828704834
train loss: 2.3025565147399902
train acc: 0.09957627207040787
Epoch 1 of 500 took 45.88186049461365
train loss: 2.3025529384613037
train acc: 0.10013020783662796
Epoch 1 of 500 took 46.64212989807129
train loss: 2.302560806274414
train acc: 0.09964139014482498
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_25040/640882711.py in <module>
120 y_batch = tl.ops.convert_to_tensor(y_batch.numpy(), dtype=tl.int64)
121
--> 122 _loss_ce = net_with_train(X_batch, y_batch)
123 train_loss += _loss_ce
124
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\models\core.py in __call__(self, data, label)
525
526 def __call__(self, data, label):
--> 527 loss = self.net_with_train(data, label)
528 return loss
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\models\core.py in __call__(self, data, label)
448 def __call__(self, data, label):
449 with tf.GradientTape() as tape:
--> 450 loss = self.net_with_loss(data, label)
451 grad = tape.gradient(loss, self.train_weights)
452 self.optimzer.apply_gradients(zip(grad, self.train_weights))
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\core\core_tensorflow.py in __call__(self, inputs, *args, **kwargs)
164 def __call__(self, inputs, *args, **kwargs):
165
--> 166 output = self.forward(inputs, *args, **kwargs)
167
168 return output
~\AppData\Local\Temp/ipykernel_25040/640882711.py in forward(self, data, label)
103
104 def forward(self, data, label):
--> 105 out = self._net(data)
106 loss = self._loss_fn(out, label)
107 return loss
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\core\core_tensorflow.py in __call__(self, inputs, *args, **kwargs)
164 def __call__(self, inputs, *args, **kwargs):
165
--> 166 output = self.forward(inputs, *args, **kwargs)
167
168 return output
D:\DeepLearning_tensorflow\DenseNet\DenseNet_tensorlayer.py in forward(self, inputs)
156 x = self.pool(x)
157
--> 158 x = self.dense_block_1(x)
159 x = self.transition_1(x)
160 x = self.dense_block_2(x)
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\core\core_tensorflow.py in __call__(self, inputs, *args, **kwargs)
164 def __call__(self, inputs, *args, **kwargs):
165
--> 166 output = self.forward(inputs, *args, **kwargs)
167
168 return output
D:\DeepLearning_tensorflow\DenseNet\DenseNet_tensorlayer.py in forward(self, x)
48 def forward(self, x):
49 for layer in self.listLayers:
---> 50 x = layer(x)
51 return x
52
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\core\core_tensorflow.py in __call__(self, inputs, *args, **kwargs)
164 def __call__(self, inputs, *args, **kwargs):
165
--> 166 output = self.forward(inputs, *args, **kwargs)
167
168 return output
D:\DeepLearning_tensorflow\DenseNet\DenseNet_tensorlayer.py in forward(self, x)
31 y = x
32 for layer in self.listLayers:
---> 33 y = layer(y)
34 y = tf.keras.layers.concatenate([x, y], axis=-1)
35 return y
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\core\core_tensorflow.py in __call__(self, inputs, *args, **kwargs)
164 def __call__(self, inputs, *args, **kwargs):
165
--> 166 output = self.forward(inputs, *args, **kwargs)
167
168 return output
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\layers\normalization.py in forward(self, inputs)
193 moving_var=self.moving_var, num_features=self.num_features, data_format=self.data_format, is_train=False
194 )
--> 195 outputs = self.batchnorm(inputs=inputs)
196 if self.act_init_flag:
197 outputs = self.act(outputs)
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorlayer\backend\ops\tensorflow_nn.py in __call__(self, inputs)
1580 self.moving_mean, mean, self.decay, zero_debias=False
1581 )
-> 1582 self.moving_var = moving_averages.assign_moving_average(self.moving_var, var, self.decay, zero_debias=False)
1583 outputs = batch_normalization(inputs, mean, var, self.beta, self.gamma, self.epsilon, self.data_format)
1584 else:
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\training\moving_averages.py in assign_moving_average(variable, value, decay, zero_debias, name)
109 return update(strategy, v, value)
110
--> 111 return replica_context.merge_call(merge_fn, args=(variable, value))
112 else:
113 strategy = distribution_strategy_context.get_cross_replica_context()
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py in merge_call(self, merge_fn, args, kwargs)
2713 merge_fn = autograph.tf_convert(
2714 merge_fn, autograph_ctx.control_status_ctx(), convert_by_default=False)
-> 2715 return self._merge_call(merge_fn, args, kwargs)
2716
2717 def _merge_call(self, merge_fn, args, kwargs):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py in _merge_call(self, merge_fn, args, kwargs)
2720 distribution_strategy_context._CrossReplicaThreadMode(self._strategy)) # pylint: disable=protected-access
2721 try:
-> 2722 return merge_fn(self._strategy, *args, **kwargs)
2723 finally:
2724 _pop_per_thread_mode()
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\autograph\impl\api.py in wrapper(*args, **kwargs)
273 def wrapper(*args, **kwargs):
274 with ag_ctx.ControlStatusCtx(status=ag_ctx.Status.UNSPECIFIED):
--> 275 return func(*args, **kwargs)
276
277 if inspect.isfunction(func) or inspect.ismethod(func):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\training\moving_averages.py in merge_fn(strategy, v, value)
107 value = strategy.extended.reduce_to(ds_reduce_util.ReduceOp.MEAN, value,
108 v)
--> 109 return update(strategy, v, value)
110
111 return replica_context.merge_call(merge_fn, args=(variable, value))
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\training\moving_averages.py in update(strategy, v, value)
98 return _zero_debias(strategy, v, value, decay)
99 else:
--> 100 return _update(strategy, v, update_fn, args=(value,))
101
102 replica_context = distribution_strategy_context.get_replica_context()
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\training\moving_averages.py in _update(strategy, var, update_fn, args)
190 return update_fn(var, *args)
191 else:
--> 192 return strategy.extended.update(var, update_fn, args)
193
194
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py in update(self, var, fn, args, kwargs, group)
2298 fn, autograph_ctx.control_status_ctx(), convert_by_default=False)
2299 with self._container_strategy().scope():
-> 2300 return self._update(var, fn, args, kwargs, group)
2301
2302 def _update(self, var, fn, args, kwargs, group):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py in _update(self, var, fn, args, kwargs, group)
2953 # The implementations of _update() and _update_non_slot() are identical
2954 # except _update() passes `var` as the first argument to `fn()`.
-> 2955 return self._update_non_slot(var, fn, (var,) + tuple(args), kwargs, group)
2956
2957 def _update_non_slot(self, colocate_with, fn, args, kwargs, should_group):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py in _update_non_slot(self, colocate_with, fn, args, kwargs, should_group)
2959 # once that value is used for something.
2960 with UpdateContext(colocate_with):
-> 2961 result = fn(*args, **kwargs)
2962 if should_group:
2963 return result
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\autograph\impl\api.py in wrapper(*args, **kwargs)
273 def wrapper(*args, **kwargs):
274 with ag_ctx.ControlStatusCtx(status=ag_ctx.Status.UNSPECIFIED):
--> 275 return func(*args, **kwargs)
276
277 if inspect.isfunction(func) or inspect.ismethod(func):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\training\moving_averages.py in update_fn(v, value)
92
93 def update_fn(v, value):
---> 94 return state_ops.assign_sub(v, (v - value) * decay, name=scope)
95
96 def update(strategy, v, value):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\ops\state_ops.py in assign_sub(ref, value, use_locking, name)
162 return gen_state_ops.assign_sub(
163 ref, value, use_locking=use_locking, name=name)
--> 164 return ref.assign_sub(value)
165
166
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in assign_sub(self, delta, use_locking, name, read_value)
1967 with ops.control_dependencies([self._parent_op]):
1968 return super(_UnreadVariable, self).assign_sub(delta, use_locking, name,
-> 1969 read_value)
1970
1971 def assign_add(self, delta, use_locking=None, name=None, read_value=True):
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py in assign_sub(self, delta, use_locking, name, read_value)
799 assign_sub_op = gen_resource_variable_ops.assign_sub_variable_op(
800 self.handle, ops.convert_to_tensor(delta, dtype=self.dtype),
--> 801 name=name)
802 if read_value:
803 return self._lazy_read(assign_sub_op)
D:\ProgramDate\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\ops\gen_resource_variable_ops.py in assign_sub_variable_op(resource, value, name)
92 _result = pywrap_tfe.TFE_Py_FastPathExecute(
93 _ctx._context_handle, tld.device_name, "AssignSubVariableOp", name,
---> 94 tld.op_callbacks, resource, value)
95 return _result
96 except _core._NotOkStatusException as e:
KeyboardInterrupt:
import time
import multiprocessing
import tensorflow as tf
import os
os.environ['TL_BACKEND'] = 'tensorflow'
import tensorlayer as tl
from DenseNet.DenseNet_tensorlayer import densenet
tl.logging.set_verbosity(tl.logging.DEBUG)
def load_ImageNet_dataset(shape=(-1, 256, 256, 3), plotable=False):
'''已加载到本地的ImageNet数据集'''
return X_train, y_train, X_test, y_test
# get the network
net = densenet("densenet-121")
X_train, y_train, X_test, y_test = load_ImageNet_dataset(shape=(-1, 256, 256, 3), plotable=False)
# training settings
batch_size = 128
n_epoch = 500
learning_rate = 0.0001
print_freq = 5
n_step_epoch = int(len(y_train) / batch_size)
n_step = n_epoch * n_step_epoch
shuffle_buffer_size = 128
train_weights = net.trainable_weights
optimizer = tl.optimizers.Adam(learning_rate)
metrics = tl.metric.Accuracy()
def generator_train():
inputs = X_train
targets = y_train
if len(inputs) != len(targets):
raise AssertionError("The length of inputs and targets should be equal")
for _input, _target in zip(inputs, targets):
# yield _input.encode('utf-8'), _target.encode('utf-8')
yield _input, _target
def generator_test():
inputs = X_test
targets = y_test
if len(inputs) != len(targets):
raise AssertionError("The length of inputs and targets should be equal")
for _input, _target in zip(inputs, targets):
# yield _input.encode('utf-8'), _target.encode('utf-8')
yield _input, _target
def _map_fn_train(img, target):
# 1. Randomly crop a [height, width] section of the image.
img = tf.image.random_crop(img, [24, 24, 3])
# 2. Randomly flip the image horizontally.
img = tf.image.random_flip_left_right(img)
# 3. Randomly change brightness.
img = tf.image.random_brightness(img, max_delta=63)
# 4. Randomly change contrast.
img = tf.image.random_contrast(img, lower=0.2, upper=1.8)
# 5. Subtract off the mean and divide by the variance of the pixels.
img = tf.image.per_image_standardization(img)
target = tf.reshape(target, ())
return img, target
def _map_fn_test(img, target):
# 1. Crop the central [height, width] of the image.
img = tf.image.resize_with_pad(img, 24, 24)
# 2. Subtract off the mean and divide by the variance of the pixels.
img = tf.image.per_image_standardization(img)
img = tf.reshape(img, (24, 24, 3))
target = tf.reshape(target, ())
return img, target
# dataset API and augmentation
train_ds = tf.data.Dataset.from_generator(
generator_train, output_types=(tf.float32, tf.int32)
) # , output_shapes=((24, 24, 3), (1)))
train_ds = train_ds.map(_map_fn_train,num_parallel_calls=multiprocessing.cpu_count())
# train_ds = train_ds.repeat(n_epoch)
train_ds = train_ds.shuffle(shuffle_buffer_size)
train_ds = train_ds.prefetch(buffer_size=4096)
train_ds = train_ds.batch(batch_size)
# value = train_ds.make_one_shot_iterator().get_next()
test_ds = tf.data.Dataset.from_generator(
generator_test, output_types=(tf.float32, tf.int32)
) # , output_shapes=((24, 24, 3), (1)))
# test_ds = test_ds.shuffle(shuffle_buffer_size)
test_ds = test_ds.map(_map_fn_test,num_parallel_calls=multiprocessing.cpu_count())
# test_ds = test_ds.repeat(n_epoch)
test_ds = test_ds.prefetch(buffer_size=4096)
test_ds = test_ds.batch(batch_size)
# value_test = test_ds.make_one_shot_iterator().get_next()
class WithLoss(tl.layers.Module):
def __init__(self, net, loss_fn):
super(WithLoss, self).__init__()
self._net = net
self._loss_fn = loss_fn
def forward(self, data, label):
out = self._net(data)
loss = self._loss_fn(out, label)
return loss
net_with_loss = WithLoss(net, loss_fn=tl.cost.softmax_cross_entropy_with_logits)
net_with_train = tl.models.TrainOneStep(net_with_loss, optimizer, train_weights)
for epoch in range(n_epoch):
start_time = time.time()
net.set_train()
train_loss, train_acc, n_iter = 0, 0, 0
for X_batch, y_batch in train_ds:
X_batch = tl.ops.convert_to_tensor(X_batch.numpy(), dtype=tl.float32)
y_batch = tl.ops.convert_to_tensor(y_batch.numpy(), dtype=tl.int64)
_loss_ce = net_with_train(X_batch, y_batch)
train_loss += _loss_ce
n_iter += 1
_logits = net(X_batch)
metrics.update(_logits, y_batch)
train_acc += metrics.result()
metrics.reset()
print("Epoch {} of {} took {}".format(epoch + 1, n_epoch, time.time() - start_time))
print(" train loss: {}".format(train_loss / n_iter))
print(" train acc: {}".format(train_acc / n_iter))