你画我猜 | 故事尾音

介绍

Quick Draw 数据集是一个包含 5000 万张图画的集合，分成了 345 个类别，这些图画都来自于 Quick, Draw! 游戏的玩家。

资源

数据集地址：https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/?pli=1
数据集官网：https://quickdraw.withgoogle.com/data
Quick, Draw! 在线体验：https://quickdraw.withgoogle.com
AutoDraw 在线体验：https://www.autodraw.com
相关论文：https://arxiv.org/abs/1704.03477

模型压缩

自从 AlexNet 一举夺得 ILSVRC 2012 ImageNet 图像分类竞赛的冠军后，卷积神经网络（CNN）的热潮便席卷了整个计算机视觉领域。CNN 模型火速替代了传统人工设计（hand-crafted）特征和分类器，不仅提供了一种端到端的处理方法，还大幅度地刷新了各个图像竞赛任务的精度，更甚者超越了人眼的精度（LFW 人脸识别任务）。CNN 模型在不断逼近计算机视觉任务的精度极限的同时，其深度和尺寸也在成倍增长。
所有模型压缩方法的核心思想是 —— 在保证精度的同时使用最少的参数。

下面是几种经典模型的尺寸和参数数量对比：

Model	Model Size(MB)	参数 (百万)
AlexNet	>200	60
VGG16	>500	138
GoogleNet	~50	6.8
Inception-v3	90~100	23.2

随之而来的是一个很尴尬的场景：如此巨大的模型只能在有限的平台下使用，根本无法移植到移动端和嵌入式芯片当中。就算想通过网络传输，但较高的带宽占用也让很多用户望而生畏。另一方面，大尺寸的模型也对设备功耗和运行速度带来了巨大的挑战。因此这样的模型距离实用还有一段距离。

在这样的情形下，模型小型化与加速成了亟待解决的问题。其实早期就有学者提出了一系列 CNN 模型压缩方法，包括权值剪值（prunning）和矩阵 SVD 分解等，但压缩率和效率还远不能令人满意。

近年来，关于模型小型化的算法从压缩角度上可以大致分为两类：从模型权重数值角度压缩和从网络架构角度压缩。另一方面，从兼顾计算速度方面，又可以划分为：仅压缩尺寸和压缩尺寸的同时提升速度。

GAP 替换全连接

Golbal Average Pooling 第一次出现在论文 Network in Network 中，后来又很多工作延续使用了 GAP，实验证明：Global Average Pooling 确实可以提高 CNN 效果。

Fully Connected layer

很长一段时间以来，全连接网络一直是 CNN 分类网络的标配结构。一般在全连接后会有激活函数来做分类，假设这个激活函数是一个多分类 softmax，那么全连接网络的作用就是将最后一层卷积得到的 feature map stretch 成向量，对这个向量做乘法，最终降低其维度，然后输入到 softmax 层中得到对应的每个类别的得分。

全连接层如此的重要，以至于全连接层过多的参数重要到会造成过拟合，所以也会有一些方法专门用来解决过拟合，比如 dropout。

Global Average Pooling

既然全连接网络可以使 feature map 的维度减少，进而输入到 softmax，但是又会造成过拟合，是不是可以用 pooling 来代替全连接。

答案是肯定的，Network in Network 工作使用 GAP 来取代了最后的全连接层，直接实现了降维，更重要的是极大地减少了网络的参数 (CNN 网络中占比最大的参数其实后面的全连接层)。Global average pooling 的结构如下图所示:

GAP 的意义是对整个网络从结构上做正则化防止过拟合。既要参数少避免全连接带来的过拟合风险，又要能达到全连接一样的转换功能，怎么做呢？直接从 feature map 的通道上下手，如果我们最终有 1000 类，那么最后一层卷积输出的 feature map 就只有 1000 个 channel，然后对这个 feature map 应用全局池化，输出长度为 1000 的向量，这就相当于剔除了全连接层黑箱子操作的特征，直接赋予了每个 channel 实际的类别意义。

实验证明，这种方法是非常有效的，这样做还有另外一个好处：不用在乎网络输入的图像尺寸。同时需要注意的是，使用 gap 也有可能造成收敛变慢。

Reference

SqueezeNet

SqueezeNet 是 F. N. Iandola,S.Han 等人于 2016 年的论文《SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size》中提出的一个小型化的网络模型结构，该网络能在保证不损失精度的同时，将原始 AlexNet 压缩至原来的 510 倍左右（< 0.5MB）。

SqueezeNet 提出了 3 点网络结构设计策略：

策略 1. 将 3x3 卷积核替换为 1x1 卷积核。
这一策略很好理解，因为 1 个 1x1 卷积核的参数是 3x3 卷积核参数的 1/9，这一改动理论上可以将模型尺寸压缩 9 倍。
策略 2. 减小输入到 3x3 卷积核的输入通道数。
我们知道，对于一个采用 3x3 卷积核的卷积层，该层所有卷积参数的数量（不考虑偏置）为：
$\begin{equation} P=N*C*3*3 \end{equation}$
式中，N 是卷积核的数量，也即输出通道数，C 是输入通道数。
因此，为了保证减小网络参数，不仅仅需要减少 3x3 卷积核的数量，还需减少输入到 3x3 卷积核的输入通道数量，即式中 C 的数量。
策略 3. 尽可能的将降采样放在网络后面的层中。
在卷积神经网络中，每层输出的特征图（feature map）是否下采样是由卷积层的步长或者池化层决定的。而一个重要的观点是：分辨率越大的特征图（延迟降采样）可以带来更高的分类精度，而这一观点从直觉上也可以很好理解，因为分辨率越大的输入能够提供的信息就越多。

下面举一个例子，假如输入为 28×28×192，输出 feature map 通道数为 128。那么，直接接 3×3 卷积，参数量为 3×3×192×128=221184。

如果先用 1×1 卷积进行降维到 96 个通道，然后再用 3×3 升维到 128，则参数量为：1×1×192×96+3×3×96×128=129024，参数量减少一半。虽然参数量减少不是很明显，但是如果 1×1 输出维度降低到 48 呢？则参数量又减少一半。

实验结果：

总结一句，可以先使用 1x1 的卷积降低通道数，然后再用 3x3 卷积升维，参数量可以大大减小。

实际建模

游戏网址在这里。

简单的 CNN 模型

模型架构如下：

训练结果：(参数：110,052 Test accuarcy: 92.92% 大小：401KB)

TensorBoard	训练集	验证集
误差
正确率

先 1 后 3

模型架构如下：

训练结果:（参数：106,236 Test accuarcy: 92.64% 大小：387KB）

TensorBoard	训练集	验证集
误差
正确率

GAP

模型架构如下：

训练结果:（参数：29,796 Test accuarcy: 88.82%(20 轮：90.88%) 大小：110KB）

TensorBoard	训练集	验证集
误差
正确率

总结对比

下面是各方法在相同参数情况下，10 轮训练后的表现：
在简单 CNN 上：

在 LeNet 上：

可以看出 GAP 的压缩比最高，但是也是收敛速度最慢的；K1K3 压缩表现不佳，主要原因是两个基本模型的 Feature Map 的数量都不够多，如果卷积层数达到 100 层以上可能效果会非常明显。

代码

import os
import numpy as np
from tqdm import tqdm
import keras
from keras import layers
import tensorflow as tf
from joblib import dump,load
from keras.callbacks import TensorBoard
from keras.applications import MobileNetV2
from keras import layers
from keras.models import Model
from keras.optimizers import SGD

root = '/media/sunyan/文档/data'


def load_data(root, vfold_ratio=0.2, max_items_per_class=4000):
    all_files = os.listdir(root)
    files_paths = [os.path.join(root,i) for i in all_files]
    # initialize variables
    x = np.empty([0, 784])
    y = np.empty([0])
    class_names = []

    # load each data file
    for idx, file in enumerate(tqdm(files_paths)):
        data = np.load(file)
        data = data[0: max_items_per_class, :]
        labels = np.full(data.shape[0], idx)

        x = np.concatenate((x, data), axis=0)
        y = np.append(y, labels)

        class_name, ext = os.path.splitext(os.path.basename(file))
        class_names.append(class_name)

    # randomize the dataset
    permutation = np.random.permutation(y.shape[0])
    x = x[permutation, :]
    y = y[permutation]

    # separate into training and testing
    vfold_size = int(x.shape[0] / 100 * (vfold_ratio * 100))

    x_test = x[0:vfold_size, :]
    y_test = y[0:vfold_size]

    x_train = x[vfold_size:x.shape[0], :]
    y_train = y[vfold_size:y.shape[0]]
    return x_train, y_train, x_test, y_test, class_names


def build_model():
    # Define model
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),
                                   padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def gap_model():
    # Define model
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),
                                   padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model

def one2three_model():
    model = keras.Sequential()
    model.add(layers.Convolution2D(16, (3, 3),padding='same',
                                   input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(32, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Convolution2D(24, (1, 1), padding='same', activation='relu'))
    model.add(layers.Convolution2D(64, (3, 3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def lenet():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Flatten. Input = 5x5x16. Output = 400.
    model.add(keras.layers.Flatten())
    # Layer 3: Fully Connected. Input = 400. Output = 300.
    model.add(keras.layers.Dense(300, activation='relu'))
    # Layer 4: Fully Connected. Input = 300. Output = 200.
    model.add(keras.layers.Dense(200, activation='relu'))
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())

    return model


def lenet_one2three():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(layers.Convolution2D(3, kernel_size=(1, 1), strides=(1, 1),padding='same', activation='relu'))
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Flatten. Input = 5x5x16. Output = 400.
    model.add(keras.layers.Flatten())
    # Layer 3: Fully Connected. Input = 400. Output = 300.
    model.add(keras.layers.Dense(300, activation='relu'))
    # Layer 4: Fully Connected. Input = 300. Output = 200.
    model.add(keras.layers.Dense(200, activation='relu'))
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())
    return model

def lenet_gap():
    model = keras.Sequential()
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    model.add(keras.layers.Convolution2D(filters=6, kernel_size=(5, 5), strides=(1, 1),
                                         padding='same', input_shape=x_train.shape[1:], activation='relu'))
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    # Layer 2: Convolutional. Output = 10x10x16.
    model.add(keras.layers.Convolution2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                                         padding='valid', activation='relu'))
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2, padding='valid'))
    model.add(layers.GlobalAveragePooling2D())
    # Layer 5: Fully Connected. Input = 200. Output = 100.
    model.add(keras.layers.Dense(64,activation='relu'))
    model.add(keras.layers.Dense(100, activation='softmax'))
    # Train model
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['top_k_categorical_accuracy'])
    print(model.summary())
    return model

def pickle_load():
    with open('x_train.pkl', 'rb') as f:
        x_train= load(f)

    with open('y_train.pkl', 'rb') as f:
        y_train = load(f)

    with open('x_test.pkl', 'rb') as f:
        x_test = load(f)

    with open('y_test.pkl', 'rb') as f:
        y_test = load(f)

    with open('class_names.pkl', 'rb') as f:
        class_names = load(f)

    return x_train, y_train, x_test, y_test, class_names


if __name__ == '__main__':
    # x_train, y_train, x_test, y_test, class_names = load_data(root=root)

    # with open('x_train.pkl', 'wb') as f:
    #     dump(x_train, f)
    #
    # with open('y_train.pkl', 'wb') as f:
    #     dump(y_train, f)
    #
    # with open('x_test.pkl', 'wb') as f:
    #     dump(x_test, f)
    #
    # with open('y_test.pkl', 'wb') as f:
    #     dump(y_test, f)
    #
    # with open('class_names.pkl', 'wb') as f:
    #     dump(class_names, f)

    x_train, y_train, x_test, y_test, class_names = pickle_load()

    num_classes = len(class_names)
    image_size = 28
    print(len(x_train))

    # import matplotlib.pyplot as plt
    # from random import randint
    #
    # idx = randint(0, len(x_train))
    # plt.imshow(x_train[idx].reshape(28, 28))
    # print(class_names[int(y_train[idx].item())])

    # Reshape and normalize
    x_train = x_train.reshape(x_train.shape[0], image_size, image_size, 1).astype('float32')
    x_test = x_test.reshape(x_test.shape[0], image_size, image_size, 1).astype('float32')

    x_train /= 255.0
    x_test /= 255.0

    # Convert class vectors to class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)

    model = lenet_one2three()

    model.fit(x=x_train, y=y_train, validation_split=0.2, batch_size=256, verbose=2, epochs=10,callbacks=[TensorBoard(log_dir='log')])

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test accuarcy: {:0.2f}%'.format(score[1] * 100))

    model.save('keras.h5')

    with open('class_names.txt', 'w') as file_handler:
        for item in class_names:
            file_handler.write("{}\n".format(item))