-
Notifications
You must be signed in to change notification settings - Fork 23
Home
Let me emphasize that this is not a tutorial on convolutional-autoencoder and how it works, but only on its implementation by tensorflow. I think there is no lack of materials on the details of convolutional neural networks in the web. Further, the tutorail is aimed at general users and I emphasized the simplicity and readability of the code.
Like any other python program we start by importing the required packages:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
n_classes = 10
batch_size = 100
# mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition => 10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')
logs_path = "./logs/"
Note that the log_path is where the file writer will write the logs for tensorboard. I make a the directory "logs" in the folder containing the program and archive the logs there.
Apparently, to have a neural network, we need layers. Here I first define some generic layers to wrap the layers inside a tf.name_scope. Practically, name_scope creates a nice box in the tensorboard graph which will contain all the variables under it. Here we make easy wrappers for the layers, but you can definitely modify it and add more arguments to the functions.
def conv2d(input, name, kshape, strides=[1, 1, 1, 1]):
with tf.name_scope(name):
W = tf.get_variable(name='w_'+name,
shape=kshape,
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
b = tf.get_variable(name='b_' + name,
shape=[kshape[3]],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
out = tf.nn.conv2d(input,W,strides=strides, padding='SAME')
out = tf.nn.bias_add(out, b)
out = tf.nn.relu(out)
return out
def maxpool2d(x,name,kshape=[1, 2, 2, 1], strides=[1, 2, 2, 1]):
with tf.name_scope(name):
out = tf.nn.max_pool(x,
ksize=kshape, #size of window
strides=strides,
padding='SAME')
return out
def fullyConnected(input, name, output_size):
with tf.name_scope(name):
input_size = input.shape[1:]
input_size = int(np.prod(input_size))
W = tf.get_variable(name='w_'+name,
shape=[input_size, output_size],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
b = tf.get_variable(name='b_'+name,
shape=[output_size],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
input = tf.reshape(input, [-1, input_size])
out = tf.nn.relu(tf.add(tf.matmul(input, W), b))
return out
def dropout(input, name, keep_rate):
with tf.name_scope(name):
out = tf.nn.dropout(input, keep_rate)
return out
The trick with convolutional-autoencoder comes down to the deconvolution layer which does the opposite of convolution. To many's surprise, that is nothing but transpose of convolution. Unfortunately, tf.nn.conv2d_transpose has a big flaw: the function requires the output shape in advance. This is a huge problem when you want to decide the batch size in future. Of course, there is a solution which I do not suggest. But if for some unknown reason you must use tf.nn.conv2d_transpose here is the solution:
def deconv2d(input, name, kshape, strides=[1, 1, 1, 1]):
with tf.name_scope(name):
W = tf.get_variable(name='w_' + name,
shape=kshape,
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
b = tf.get_variable(name='b_' + name,
shape=[kshape[2]],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
# tf.shape(input)[0] is really important to pass dynamic size of batches to deconv layer
output_shape = tf.stack([tf.shape(input)[0],input.shape[1],input.shape[2],kshape[2]])
out = tf.nn.conv2d_transpose(input, W,
output_shape=output_shape,
strides=strides, padding='SAME')
out = tf.nn.bias_add(out, b)
out = tf.nn.relu(out)
I on the other hand, prefer to use tf.contrib.layers.conv2d_transpose:
def deconv2d(input, name, kshape, n_outputs, strides=[1, 1]):
with tf.name_scope(name):
out = tf.contrib.layers.conv2d_transpose(input,
num_outputs= n_outputs,
kernel_size=kshape,
stride=strides,
padding='SAME',
weights_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False),
biases_initializer=tf.contrib.layers.xavier_initializer(uniform=False),
activation_fn=tf.nn.relu)
return out
Finally, similar to maxpooling that we use to downsample on the coder side, we need something for upsampling on the decoder side. You can use tf.image.resize_bilinear for that:
def upsample(input, name, factor=[2,2]):
size = [int(input.shape[1] * factor[0]), int(input.shape[2] * factor[1])]
with tf.name_scope(name):
out = tf.image.resize_bilinear(input, size=size, align_corners=None, name=None)
return out
Finally we are ready to build our deep autoencoder and see the fruit of going trough the trouble of defining all the above wrappers. Let us call it ConvAutoEncoder.
def ConvAutoEncoder(x, name):
with tf.name_scope(name):
input = tf.reshape(x, shape=[-1, 28, 28, 1])
# coding part
c1 = conv2d(input, name='c1', kshape=[5, 5, 1, 25])
p1 = maxpool2d(c1, name='p1')
do1 = dropout(p1, name='do1', keep_rate=0.75)
do1 = tf.reshape(do1, shape=[-1, 14*14*25])
fc1 = fullyConnected(do1, name='fc1', output_size=14*14*5)
do2 = dropout(fc1, name='do2', keep_rate=0.75)
fc2 = fullyConnected(do2, name='fc2', output_size=14*14)
# Decoding part
fc3 = fullyConnected(fc2, name='fc3', output_size=14 * 14 * 5)
do3 = dropout(fc3, name='do3', keep_rate=0.75)
fc4 = fullyConnected(do3, name='fc4', output_size=14 * 14 * 25)
do4 = dropout(fc4, name='do3', keep_rate=0.75)
do4 = tf.reshape(do4, shape=[-1, 14, 14, 25])
dc1 = deconv2d(do4, name='dc1', kshape=[5,5],n_outputs=25)
up1 = upsample(dc1, name='up1', factor=[2, 2])
output = fullyConnected(up1, name='output', output_size=28*28)
with tf.name_scope('cost'):
cost = tf.reduce_mean(tf.square(tf.subtract(output, x)))
return output, cost
I think you are also amazed by how we built such a huge network with so few lines!
Probably you can find this part in any tensorflow tutorial, as any netwrok needs to be trained. We also need to collect the logs. Here I only train the autoencoder for 5 epochs to make sure it works, but you can definitely increase that.
def train_network(x):
prediction, cost = ConvAutoEncoder(x, 'ConvAutoEnc')
with tf.name_scope('opt'):
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Create a summary to monitor cost tensor
tf.summary.scalar("cost", cost)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
n_epochs = 5
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# create log writer object
writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
for epoch in range(n_epochs):
avg_cost = 0
n_batches = int(mnist.train.num_examples / batch_size)
# Loop over all batches
for i in range(n_batches):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c, summary = sess.run([optimizer, cost, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / n_batches
# write log
writer.add_summary(summary, epoch * n_batches + i)
# Display logs per epoch step
print('Epoch', epoch+1, ' / ', n_epochs, 'cost:', avg_cost)
print('Optimization Finished')
print('Cost:', cost.eval({x: mnist.test.images}))
Finally, train the autoencoder simply by using
train_network(x)
As mentioned, I only let the program to run for 5 epochs to make sure it works fine and here is the output.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch 1 / 5 cost: 0.0473340077495
Epoch 2 / 5 cost: 0.0307401267168
Epoch 3 / 5 cost: 0.0280541780659
Epoch 4 / 5 cost: 0.0268382367187
Epoch 5 / 5 cost: 0.0261269084534
Optimization Finished
Cost: 0.0255299
Process finished with exit code 0
Finally, we are ready to see how the graph looks on the tensor board. For that, open a terminal and "cd" to the logs directory which we built in the start of this tutorial and type
tensorboard --logdir=.
and then open the url http://0.0.0.0:6006 in browser. If you have done everything as instructed, you will have something like this:
![](https://github.com/arashsaber/Deep-Convolutional-AutoEncoder/raw/master/graph-run1.png)
and if you click open the ConvAutoEncoder box, you will see your network cleanly arranged layer by layer in there:
![](https://github.com/arashsaber/Deep-Convolutional-AutoEncoder/raw/master/graph-run2.png)
Hope this was helpful.