1. Home


< Blog />

Tensorflow MNIST Beginner notes




Here’s my jupyter notebook for this lesson.

Tensorflow Tutorial #1 - MNIST For ML Beginners

TensorFlow lets us describe a graph of interacting operations that run entirely outside Python.

Note, all blockquotes in this notebook are quotes from the original source.

First we need to import tensorflow and our training dataset.

In [1]:
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
WARNING: Logging before flag parsing goes to stderr.
W0921 10:14:08.420713 4598717888 deprecation.py:323] From <ipython-input-1-0b4139a002a1>:4: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
W0921 10:14:08.422048 4598717888 deprecation.py:323] From /Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
W0921 10:14:08.422900 4598717888 deprecation.py:323] From /Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
W0921 10:14:08.636867 4598717888 deprecation.py:323] From /Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
W0921 10:14:08.639654 4598717888 deprecation.py:323] From /Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
W0921 10:14:08.680934 4598717888 deprecation.py:323] From /Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Input data and Placeholders

In this graph, we describe these interacting operations by manipulating symbolic variables. These variables are called placeholders, a placeholder is a way to input values into our computation graph.

Our images are 28x28 which are flattened into a 784-dimensional vector - a 2-D tensor of floating point numbers with a shape of [None, 784]. None means that a dimension can be of any length.

In [2]:
# X is our placeholder value for our input
x = tf.placeholder(tf.float32, [None, 784])

Weights and biases and Variables

We need the weights and biases for our model, these are created using tf.Variable, Variables can be modified during computation.

W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes.

b has a shape of [10] so we can add it to the output.

We initialize both as tensors full of zeros.

In [3]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

Create our model

Next we create our model, here's an explanation from the tensorflow docs as to what's happening.

First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had , as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.

In [4]:
y = tf.nn.softmax(tf.matmul(x, W) + b)


We train models with an objective to reduce the loss/error/cost. We want to minimize this error to make our model more accurate.

A very common function to determine the loss of a model is called "cross-entropy".

\begin{align} H_{y'}(y) = -\sum_i y'_i \log(y_i) \end{align}

The cross-entropy is measuring how inefficient our predictions are for describing the truth.

$y$ is our predicted probability distribution

$y'$ (Y-Hat) is the true distribution (the one hot vector with the digit labels)

To implement cross-entropy we need to first add a new placeholder to input the correct answers.

In [5]:
y_ = tf.placeholder(tf.float32, [None, 10])

Next we implement the cross-entropy function:

In [6]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

First, tf.log computes the logarithm of each element of y.
Next, we multiply each element of y_ with the corresponding element of tf.log(y).
Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter.
Finally, tf.reduce_mean computes the mean over all the examples in the batch.

Note that in the source code, we don't use this formulation, because it is numerically unstable. Instead, we apply tf.nn.softmax_cross_entropy_with_logits on the unnormalized logits (e.g., we call softmax_cross_entropy_with_logits on tf.matmul(x, W) + b), because this more numerically stable function internally computes the softmax activation. In your code, consider using tf.nn.softmax_cross_entropy_with_logits instead.

Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.

Next, We ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5.

What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.

In [7]:
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)

Run our model

Now we've setup and defined our model, we can launch the model in a session.

In [8]:
# Create an InteractiveSession 
sess = tf.InteractiveSession()

# Create an operation to initialize the variables we created
In [9]:
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


We need to evaluate our model to see how well it did.

First let's figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.

In [10]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

That gives us a list of booleans.

To determine what fraction are correct, we cast to floating point numbers and then take the mean.

For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

In [11]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
In [12]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))


We've learned about Softmax regression and created, trained and evaluated a model.

We can see in the table below we hit around 92% accuracy after 10,000 epochs.

We need a better model to increase this further, which we'll do next.

In [13]:
import pandas
from IPython.display import display, HTML

data = [["500",0.8908],
df = pandas.DataFrame(data, ['500', '1,000', '2,000', '5,000', '10,000'], ['Epochs', 'Percentage'])
Epochs Percentage
500 0.8908
1,000 0.9010
2,000 0.9143
5,000 0.9181
10,000 0.9213

Tensorflow Tutorial #2 - Deep MNIST for Experts

We're following this tutorial: https://www.tensorflow.org/versions/r1.2/get_started/mnist/pros

What we will accomplish in this tutorial:

  • Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image
  • Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
  • Check the model's accuracy with our test data
  • Build, train, and test a multilayer convolutional neural network to improve the results

Build a Softmax Regression Model

In this section we will build a softmax regression model with a single linear layer.

In [14]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Data processing
# Here mnist is a lightweight class which stores the training, validation,
# and testing sets as NumPy arrays. It also provides a function for iterating
# through data minibatches, which we will use below.
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# TF Session
# InteractiveSession allows you to interleave operations which build a computation graph 
# with ones that run the graph. This is particularly convenient when working in 
# interactive contexts like IPython.
sess = tf.InteractiveSession()

# TF Placeholders
# Placeholders are values we input when asking TF to run a computation
# The input images `x` will consist of a 2d tensor of floating point numbers.
# Here we assign it a shape of [None, 784], where 784 is the dimensionality
# of a single flattened 28 by 28 pixel MNIST image, and None indicates that
# the first dimension, corresponding to the batch size, can be of any size.
# The target output classes y_ will also consist of a 2d tensor, where each
# row is a one-hot 10-dimensional vector indicating which digit class
# (zero through nine) the corresponding MNIST image belongs to.
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

# TF Variables
# A Variable is a value that lives in TensorFlow's computation graph.
# It can be used and even modified by the computation.
# The model paramaters are generally variables also.
# We initialize both as tensors full of zeros.
# `W` is a 784x10 matrix (because we have 784 input features and 10 outputs)
# `b` is a 10-dimensional vector (because we have 10 classes)
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

# Initialize variables
# Before Variables can be used within a session, they must be initialized using that session.
# This step takes the initial values (in this case tensors full of zeros) that have already
# been specified, and assigns them to each Variable.
# This can be done for all Variables at once:

# Regression Model
# "input times weight, add a bias, activate" - @Siraj Raval
# We multiply the vectorized input images `x` by the weight matrix `W`, add the bias `b`.
y = tf.matmul(x,W) + b

# Loss function
# Loss indicates how bad the model's prediction was on a single example;
# we try to minimize that while training across all the examples. Here,
# our loss function is the cross-entropy between the target and the softmax
# activation function applied to the model's prediction.
# Note that tf.nn.softmax_cross_entropy_with_logits internally applies the
# softmax on the model's unnormalized model prediction and sums across all
# classes, and tf.reduce_mean takes the average over these sums.
# As in the beginners tutorial, we use the stable formulation:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y))

# Train the model
# Because TensorFlow knows the entire computation graph, it can use automatic
# differentiation to find the gradients of the loss with respect to each of the
# variables.
# TensorFlow has a variety of built-in optimization algorithms.
# For this example, we will use steepest gradient descent, with a step length of 0.5,
# to descend the cross entropy.
# What TensorFlow actually does in this single line is add new operations to the
# computation graph. These operations include ones to compute gradients, compute
# parameter update steps, and apply update steps to the parameters.
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Run the training steps
# The returned operation train_step, when run, will apply the gradient descent updates
# to the parameters. Training the model can therefore be accomplished by repeatedly
# running train_step.
for _ in range(1000):
  # We load 100 training examples in each training iteration  
  batch = mnist.train.next_batch(100)
  # We then run the train_step operation, using feed_dict to replace the placeholder
  # tensors x and y_ with the training examples.
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

# Evaluate the model

# tf.argmax gives you the index of the highest entry in a tensor along some axis.
# For example, tf.argmax(y,1) is the label our model thinks is most likely for each input,
# while tf.argmax(y_,1) is the true label.
# We can use tf.equal to check if our prediction matches the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

# That gives us a list of booleans. To determine what fraction are correct, we cast to
# floating point numbers and then take the mean.
# For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Finally, we can evaluate our accuracy on the test data. This should be about 92% correct.
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
/Users/anillakhman/.virtualenvs/docs/lib/python3.6/site-packages/tensorflow/python/client/session.py:1735: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
  warnings.warn('An interactive session is already active. This can '

You should see something close to 0.9188 (92%).

So far, we've created what we did in the beginners tutorial.

We have our model down to 92% accuracy, I think that's quite good considering how little we've done, but it looks like we can do better.

In [ ]:

Build a Multilayer Convolutional Network

In this section we'll build a small convolutional neural network.

This will get us to around 99.2% accuracy!

In [15]:
# Weight Initialization
# To create this model, we're going to need to create a lot of weights and biases.
# One should generally initialize weights with a small amount of noise for symmetry
# breaking, and to prevent 0 gradients. Since we're using ReLU neurons, it is also
# good practice to initialize them with a slightly positive initial bias to avoid
# "dead neurons". Instead of doing this repeatedly while we build the model,
# let's create two handy functions to do it for us.
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

# Convolution and Pooling
# TensorFlow also gives us a lot of flexibility in convolution and pooling operations.
# How do we handle the boundaries? What is our stride size?
# In this example, we're always going to choose the vanilla version.
# Our convolutions use a stride of one and are zero padded so that the output is the 
# same size as the input.
# Our pooling is plain old max pooling over 2x2 blocks.
# We'll also abstract those operations into functions.
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# First Convolutional Layer
# Our first layer will consist of a convolution, followed by max pooling.
# The convolution will compute 32 features for each 5x5 patch.
# Its weight tensor will have a shape of [5, 5, 1, 32].
# The first two dimensions are the patch size (5)
# The next is the number of input channels (1)
# The last is the number of output channels (32)
# We will also have a bias vector with a component for each output channel.
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

# To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions
# corresponding to image width and height, and the final dimension corresponding to the
# number of color channels.
x_image = tf.reshape(x, [-1, 28, 28, 1])

# We then convolve x_image with the weight tensor, add the bias, apply the ReLU function,
# and finally max pool.
# The max_pool_2x2 method will reduce the image size to 14x14.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# Second Convolutional Layer
# In order to build a deep network, we stack several layers of this type.
# The second layer will have 64 features for each 5x5 patch.
# The max_pool_2x2 method will reduce the image size to 7x7.
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Densely Connected Layer
# Now that the image size has been reduced to 7x7, we add a fully-connected layer
# with 1024 neurons to allow processing on the entire image.
# We reshape the tensor from the pooling layer into a batch of vectors,
# multiply by a weight matrix, add a bias, and apply a ReLU.
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
# To reduce overfitting, we will apply dropout before the readout layer.
# We create a placeholder for the probability that a neuron's output is kept during dropout.
# This allows us to turn dropout on during training, and turn it off during testing.
# TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition
# to masking them, so dropout just works without any additional scaling.
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout Layer
# Finally, we add a layer, just like for the one layer softmax regression above.
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Train and evaluate
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y_conv))

# We will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer.
# We will add logging to every 100th iteration in the training process.
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

if False: # Don't execute for compiling blog    
    # We will also use tf.Session rather than tf.InteractiveSession.
    # This better separates the process of creating the graph (model sepecification)
    # and the process of evaluating the graph (model fitting).
    with tf.Session() as sess:
      for i in range(2000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
          train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
          print('step %d, training accuracy %g' % (i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

      # Save the model --------------------------------------------
      # Build the signature_def_map.
      classification_inputs = tf.saved_model.utils.build_tensor_info(x)
      classification_outputs_classes = tf.saved_model.utils.build_tensor_info(y)
      classification_outputs_scores = tf.saved_model.utils.build_tensor_info(y_conv)

      classification_signature = (
            inputs={tf.saved_model.signature_constants.CLASSIFY_INPUTS: classification_inputs},
            outputs={tf.saved_model.signature_constants.CLASSIFY_OUTPUT_CLASSES: classification_outputs_classes,
                tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES: classification_outputs_scores},

      tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
      tensor_info_y = tf.saved_model.utils.build_tensor_info(y)

      prediction_signature = (
              inputs={'images': tensor_info_x},
              outputs={'scores': tensor_info_y},

      export_path = './export_model'
      print('Exporting trained model to: %r' % export_path)
      builder = tf.saved_model.builder.SavedModelBuilder(export_path)
              sess, [tf.saved_model.tag_constants.SERVING],
      # Complete --------------------------------------------

      print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
W0921 10:14:11.835332 4598717888 deprecation.py:506] From <ipython-input-15-38e72f5cab85>:106: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

I've disabled the running of the model via the if flag as it takes to long when I compile my blog but you would see the following:

step 0, training accuracy 0.04
step 100, training accuracy 0.9
step 200, training accuracy 0.94
step 300, training accuracy 0.92
step 400, training accuracy 0.86
step 500, training accuracy 0.94
step 600, training accuracy 0.86
step 700, training accuracy 0.92
step 800, training accuracy 0.94
step 900, training accuracy 0.92
step 1000, training accuracy 0.92
step 1100, training accuracy 0.96
step 1200, training accuracy 0.96
step 1300, training accuracy 0.98
step 1400, training accuracy 0.9
step 1500, training accuracy 0.96
step 1600, training accuracy 0.96
step 1700, training accuracy 1
step 1800, training accuracy 0.94
step 1900, training accuracy 0.96
test accuracy 0.9745

We can see the accuracy increase as we go through 2,000 steps to 97%.

We should reach around 99.2% with 20,000 epochs.

I'm starting to understand how this all comes together, I think I need to get more practical experience with creating models using my own data set.

Saving the model for export

I've edited the code above to add some code which saves the model into a pb (protobuffer) format.

You can use tensorflow serving to deploy your model for the web or locally. There's a great notebook and video here:

You can view an iOS implementation example here, note it uses a pb model.

This youtube video explains the process.

Checkout the docs:

Running the model on iOS

The next step is to take our pb file and use it on a mobile device, I'm aiming for iOS.

You can do this 2 ways, the first using tensorflow (import it natively and load and run your code) or the second, convert it to a CoreML model and then import it through xCode.

Using the Tensorflow library

When using tensorflow you have to write the signature definition of your model yourself, I think you also have to convert it for tensorflow lite.

More info here: https://www.tensorflow.org/mobile/ios_build

Converting to CoreML

CoreML allows you to import model in xCode and have it automatically detect the input and output types through the signature provided by the model.

You can convert a model using tfcoreml like this.

import tfcoreml as tf_converter

I was following this tutorial for integrating a tensorflow model in an iOS app, but am having a few issues which i'm debugging through now.

Checkout Github for examples of how others have done it, here's a few links I found.