Skip to content
arashsaber edited this page May 9, 2017 · 6 revisions

Deep Convolutional-AutoEncoder

Let me emphasize that this is not a tutorial on convolutional-autoencoder and how it works, but only on its implementation by tensorflow. I think there is no lack of materials on the details of convolutional neural networks in the web. Further, the tutorail is aimed at general users and I emphasized the simplicity and readability of the code.

Package and data import

Like any other python program we start by importing the required packages:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

n_classes = 10
batch_size = 100

# mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition => 10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

logs_path = "./logs/"

Note that the log_path is where the file writer will write the logs for tensorboard. I make a the directory "logs" in the folder containing the program and archive the logs there.

Defining the layers

Apparently, to have a neural network, we need layers. Here I first define some generic layers to wrap the layers inside a tf.name_scope. Practically, name_scope creates a nice box in the tensorboard graph which will contain all the variables under it. Here we make easy wrappers for the layers, but you can definitely modify it and add more arguments to the functions.

Convolutional layer:

def conv2d(input, name, kshape, strides=[1, 1, 1, 1]):
    with tf.name_scope(name):
        W = tf.get_variable(name='w_'+name,
                            shape=kshape,
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        b = tf.get_variable(name='b_' + name,
                            shape=[kshape[3]],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        out = tf.nn.conv2d(input,W,strides=strides, padding='SAME')
        out = tf.nn.bias_add(out, b)
        out = tf.nn.relu(out)
        return out

Maxpooling layer:

def maxpool2d(x,name,kshape=[1, 2, 2, 1], strides=[1, 2, 2, 1]):
    with tf.name_scope(name):
        out = tf.nn.max_pool(x,
                             ksize=kshape, #size of window
                             strides=strides,
                             padding='SAME')
        return out

Fully-connected layer:

def fullyConnected(input, name, output_size):
    with tf.name_scope(name):
        input_size = input.shape[1:]
        input_size = int(np.prod(input_size))
        W = tf.get_variable(name='w_'+name,
                            shape=[input_size, output_size],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        b = tf.get_variable(name='b_'+name,
                            shape=[output_size],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        input = tf.reshape(input, [-1, input_size])
        out = tf.nn.relu(tf.add(tf.matmul(input, W), b))
        return out

Dropout layer:

def dropout(input, name, keep_rate):
    with tf.name_scope(name):
        out = tf.nn.dropout(input, keep_rate)
        return out

Deconvolutional layer:

The trick with convolutional-autoencoder comes down to the deconvolution layer which does the opposite of convolution. To many's surprise, that is nothing but transpose of convolution. Unfortunately, tf.nn.conv2d_transpose has a big flaw: the function requires the output shape in advance. This is a huge problem when you want to decide the batch size in future. Of course, there is a solution which I do not suggest. But if for some unknown reason you must use tf.nn.conv2d_transpose here is the solution:

def deconv2d(input, name, kshape, strides=[1, 1, 1, 1]):
    with tf.name_scope(name):
        W = tf.get_variable(name='w_' + name,
                            shape=kshape,
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        b = tf.get_variable(name='b_' + name,
                            shape=[kshape[2]],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        # tf.shape(input)[0] is really important to pass dynamic size of batches to deconv layer
        output_shape = tf.stack([tf.shape(input)[0],input.shape[1],input.shape[2],kshape[2]])
        out = tf.nn.conv2d_transpose(input, W,
                                     output_shape=output_shape,
                                     strides=strides, padding='SAME')
        out = tf.nn.bias_add(out, b)
        out = tf.nn.relu(out)

I on the other hand, prefer to use tf.contrib.layers.conv2d_transpose:

def deconv2d(input, name, kshape, n_outputs, strides=[1, 1]):
    with tf.name_scope(name):
        out = tf.contrib.layers.conv2d_transpose(input,
                                                 num_outputs= n_outputs,
                                                 kernel_size=kshape,
                                                 stride=strides,
                                                 padding='SAME',
                                                 weights_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False),
                                                 biases_initializer=tf.contrib.layers.xavier_initializer(uniform=False),
                                                 activation_fn=tf.nn.relu)
        return out

Upsampling layer:

Finally, similar to maxpooling that we use to downsample on the coder side, we need something for upsampling on the decoder side. You can use tf.image.resize_bilinear for that:

def upsample(input, name, factor=[2,2]):
    size = [int(input.shape[1] * factor[0]), int(input.shape[2] * factor[1])]
    with tf.name_scope(name):
        out = tf.image.resize_bilinear(input, size=size, align_corners=None, name=None)
        return out

Model

Finally we are ready to build our deep autoencoder and see the fruit of going trough the trouble of defining all the above wrappers. Let us call it ConvAutoEncoder.

def ConvAutoEncoder(x, name):
    with tf.name_scope(name):
        
        input = tf.reshape(x, shape=[-1, 28, 28, 1])

        # coding part
        c1 = conv2d(input, name='c1', kshape=[5, 5, 1, 25])
        p1 = maxpool2d(c1, name='p1')
        do1 = dropout(p1, name='do1', keep_rate=0.75)
        do1 = tf.reshape(do1, shape=[-1, 14*14*25])
        fc1 = fullyConnected(do1, name='fc1', output_size=14*14*5)
        do2 = dropout(fc1, name='do2', keep_rate=0.75)
        fc2 = fullyConnected(do2, name='fc2', output_size=14*14)
        # Decoding part
        fc3 = fullyConnected(fc2, name='fc3', output_size=14 * 14 * 5)
        do3 = dropout(fc3, name='do3', keep_rate=0.75)
        fc4 = fullyConnected(do3, name='fc4', output_size=14 * 14 * 25)
        do4 = dropout(fc4, name='do3', keep_rate=0.75)
        do4 = tf.reshape(do4, shape=[-1, 14, 14, 25])
        dc1 = deconv2d(do4, name='dc1', kshape=[5,5],n_outputs=25)
        up1 = upsample(dc1, name='up1', factor=[2, 2])
        output = fullyConnected(up1, name='output', output_size=28*28)
        with tf.name_scope('cost'):
            cost = tf.reduce_mean(tf.square(tf.subtract(output, x)))
        return output, cost

I think you are also amazed by how we built such a huge network with so few lines!

Training

Probably you can find this part in any tensorflow tutorial, as any netwrok needs to be trained. We also need to collect the logs. Here I only train the autoencoder for 5 epochs to make sure it works, but you can definitely increase that.

def train_network(x):
    prediction, cost = ConvAutoEncoder(x, 'ConvAutoEnc')
    with tf.name_scope('opt'):
        optimizer = tf.train.AdamOptimizer().minimize(cost)

    # Create a summary to monitor cost tensor
    tf.summary.scalar("cost", cost)

    # Merge all summaries into a single op
    merged_summary_op = tf.summary.merge_all()

    n_epochs = 5
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        # create log writer object
        writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

        for epoch in range(n_epochs):
            avg_cost = 0
            n_batches = int(mnist.train.num_examples / batch_size)
            # Loop over all batches
            for i in range(n_batches):
                batch_x, batch_y = mnist.train.next_batch(batch_size)
                # Run optimization op (backprop) and cost op (to get loss value)
                _, c, summary = sess.run([optimizer, cost, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
                # Compute average loss
                avg_cost += c / n_batches
                # write log
                writer.add_summary(summary, epoch * n_batches + i)

            # Display logs per epoch step
            print('Epoch', epoch+1, ' / ', n_epochs, 'cost:', avg_cost)
        print('Optimization Finished')
        print('Cost:', cost.eval({x: mnist.test.images}))

Output

Finally, train the autoencoder simply by using

train_network(x)

As mentioned, I only let the program to run for 5 epochs to make sure it works fine and here is the output.

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch 1  /  5 cost: 0.0473340077495
Epoch 2  /  5 cost: 0.0307401267168
Epoch 3  /  5 cost: 0.0280541780659
Epoch 4  /  5 cost: 0.0268382367187
Epoch 5  /  5 cost: 0.0261269084534
Optimization Finished
Cost: 0.0255299

Process finished with exit code 0

Graph of the autoencoder:

Finally, we are ready to see how the graph looks on the tensor board. For that, open a terminal and "cd" to the logs directory which we built in the start of this tutorial and type

tensorboard --logdir=.

and then open the url http://0.0.0.0:6006 in browser. If you have done everything as instructed, you will have something like this:

and if you click open the ConvAutoEncoder box, you will see your network cleanly arranged layer by layer in there:

Hope this was helpful.