Tensorflow MNIST Beginner notes¶
Introduction¶
- Softmax regression
- Cross entropy
Introduction¶
Here’s my jupyter notebook for this lesson.
Tensorflow Tutorial #1 - MNIST For ML Beginners¶
TensorFlow lets us describe a graph of interacting operations that run entirely outside Python.
Note, all blockquotes in this notebook are quotes from the original source.
First we need to import tensorflow and our training dataset.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Input data and Placeholders¶
In this graph, we describe these interacting operations by manipulating symbolic variables. These variables are called placeholders
, a placeholder is a way to input values into our computation graph.
Our images are 28x28
which are flattened into a 784-dimensional vector - a 2-D tensor of floating point numbers with a shape of [None, 784]
. None means that a dimension can be of any length.
# X is our placeholder value for our input
x = tf.placeholder(tf.float32, [None, 784])
Weights and biases and Variables¶
We need the weights and biases for our model, these are created using tf.Variable
, Variables can be modified during computation.
W
has a shape of [784, 10]
because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes.
b
has a shape of [10]
so we can add it to the output.
We initialize both as tensors full of zeros.
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
Create our model¶
Next we create our model, here's an explanation from the tensorflow docs as to what's happening.
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had , as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.
y = tf.nn.softmax(tf.matmul(x, W) + b)
Training¶
We train models with an objective to reduce the loss/error/cost. We want to minimize this error to make our model more accurate.
A very common function to determine the loss of a model is called "cross-entropy".
\begin{align} H_{y'}(y) = -\sum_i y'_i \log(y_i) \end{align}The cross-entropy is measuring how inefficient our predictions are for describing the truth.
$y$ is our predicted probability distribution
$y'$ (Y-Hat) is the true distribution (the one hot vector with the digit labels)
To implement cross-entropy we need to first add a new placeholder to input the correct answers.
y_ = tf.placeholder(tf.float32, [None, 10])
Next we implement the cross-entropy function:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
First,
tf.log
computes the logarithm of each element ofy
.
Next, we multiply each element ofy_
with the corresponding element oftf.log(y)
.
Thentf.reduce_sum
adds the elements in the second dimension ofy
, due to thereduction_indices=[1]
parameter.
Finally,tf.reduce_mean
computes the mean over all the examples in the batch.Note that in the source code, we don't use this formulation, because it is numerically unstable. Instead, we apply
tf.nn.softmax_cross_entropy_with_logits
on the unnormalized logits (e.g., we callsoftmax_cross_entropy_with_logits
ontf.matmul(x, W) + b)
, because this more numerically stable function internally computes the softmax activation. In your code, consider usingtf.nn.softmax_cross_entropy_with_logits
instead.
Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.
Next, We ask TensorFlow to minimize
cross_entropy
using the gradient descent algorithm with a learning rate of 0.5.What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
Run our model¶
Now we've setup and defined our model, we can launch the model in a session.
# Create an InteractiveSession
sess = tf.InteractiveSession()
# Create an operation to initialize the variables we created
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Evaluation¶
We need to evaluate our model to see how well it did.
First let's figure out where we predicted the correct label.
tf.argmax
is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example,tf.argmax(y,1)
is the label our model thinks is most likely for each input, whiletf.argmax(y_,1)
is the correct label. We can usetf.equal
to check if our prediction matches the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
That gives us a list of booleans.
To determine what fraction are correct, we cast to floating point numbers and then take the mean.
For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Conclusion¶
We've learned about Softmax regression and created, trained and evaluated a model.
We can see in the table below we hit around 92% accuracy after 10,000 epochs.
We need a better model to increase this further, which we'll do next.
import pandas
from IPython.display import display, HTML
data = [["500",0.8908],
["1,000",0.901],
["2,000",0.9143],
["5,000",0.9181],
["10,000",0.9213]]
df = pandas.DataFrame(data, ['500', '1,000', '2,000', '5,000', '10,000'], ['Epochs', 'Percentage'])
display(HTML(df.to_html(index=False)))
Tensorflow Tutorial #2 - Deep MNIST for Experts¶
We're following this tutorial: https://www.tensorflow.org/versions/r1.2/get_started/mnist/pros
What we will accomplish in this tutorial:
- Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image
- Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
- Check the model's accuracy with our test data
- Build, train, and test a multilayer convolutional neural network to improve the results
Build a Softmax Regression Model¶
In this section we will build a softmax regression model with a single linear layer.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#
# Data processing
#
# Here mnist is a lightweight class which stores the training, validation,
# and testing sets as NumPy arrays. It also provides a function for iterating
# through data minibatches, which we will use below.
#
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
#
# TF Session
#
# InteractiveSession allows you to interleave operations which build a computation graph
# with ones that run the graph. This is particularly convenient when working in
# interactive contexts like IPython.
#
sess = tf.InteractiveSession()
#
# TF Placeholders
#
# Placeholders are values we input when asking TF to run a computation
#
# The input images `x` will consist of a 2d tensor of floating point numbers.
# Here we assign it a shape of [None, 784], where 784 is the dimensionality
# of a single flattened 28 by 28 pixel MNIST image, and None indicates that
# the first dimension, corresponding to the batch size, can be of any size.
# The target output classes y_ will also consist of a 2d tensor, where each
# row is a one-hot 10-dimensional vector indicating which digit class
# (zero through nine) the corresponding MNIST image belongs to.
#
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
#
# TF Variables
#
# A Variable is a value that lives in TensorFlow's computation graph.
# It can be used and even modified by the computation.
#
# The model paramaters are generally variables also.
#
# We initialize both as tensors full of zeros.
#
# `W` is a 784x10 matrix (because we have 784 input features and 10 outputs)
# `b` is a 10-dimensional vector (because we have 10 classes)
#
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
#
# Initialize variables
#
# Before Variables can be used within a session, they must be initialized using that session.
# This step takes the initial values (in this case tensors full of zeros) that have already
# been specified, and assigns them to each Variable.
#
# This can be done for all Variables at once:
#
sess.run(tf.global_variables_initializer())
#
# Regression Model
#
# "input times weight, add a bias, activate" - @Siraj Raval
#
# We multiply the vectorized input images `x` by the weight matrix `W`, add the bias `b`.
y = tf.matmul(x,W) + b
#
# Loss function
#
# Loss indicates how bad the model's prediction was on a single example;
# we try to minimize that while training across all the examples. Here,
# our loss function is the cross-entropy between the target and the softmax
# activation function applied to the model's prediction.
#
# Note that tf.nn.softmax_cross_entropy_with_logits internally applies the
# softmax on the model's unnormalized model prediction and sums across all
# classes, and tf.reduce_mean takes the average over these sums.
#
# As in the beginners tutorial, we use the stable formulation:
#
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y))
#
# Train the model
#
# Because TensorFlow knows the entire computation graph, it can use automatic
# differentiation to find the gradients of the loss with respect to each of the
# variables.
#
# TensorFlow has a variety of built-in optimization algorithms.
#
# For this example, we will use steepest gradient descent, with a step length of 0.5,
# to descend the cross entropy.
#
# What TensorFlow actually does in this single line is add new operations to the
# computation graph. These operations include ones to compute gradients, compute
# parameter update steps, and apply update steps to the parameters.
#
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
#
# Run the training steps
#
# The returned operation train_step, when run, will apply the gradient descent updates
# to the parameters. Training the model can therefore be accomplished by repeatedly
# running train_step.
#
for _ in range(1000):
# We load 100 training examples in each training iteration
batch = mnist.train.next_batch(100)
# We then run the train_step operation, using feed_dict to replace the placeholder
# tensors x and y_ with the training examples.
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
#
# Evaluate the model
#
# tf.argmax gives you the index of the highest entry in a tensor along some axis.
#
# For example, tf.argmax(y,1) is the label our model thinks is most likely for each input,
# while tf.argmax(y_,1) is the true label.
#
# We can use tf.equal to check if our prediction matches the truth.
#
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
#
#
#
# That gives us a list of booleans. To determine what fraction are correct, we cast to
# floating point numbers and then take the mean.
#
# For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
#
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
#
# Finally, we can evaluate our accuracy on the test data. This should be about 92% correct.
#
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
You should see something close to 0.9188 (92%).
So far, we've created what we did in the beginners tutorial.
We have our model down to 92% accuracy, I think that's quite good considering how little we've done, but it looks like we can do better.
Build a Multilayer Convolutional Network¶
In this section we'll build a small convolutional neural network.
This will get us to around 99.2% accuracy!
#
# Weight Initialization
#
# To create this model, we're going to need to create a lot of weights and biases.
# One should generally initialize weights with a small amount of noise for symmetry
# breaking, and to prevent 0 gradients. Since we're using ReLU neurons, it is also
# good practice to initialize them with a slightly positive initial bias to avoid
# "dead neurons". Instead of doing this repeatedly while we build the model,
# let's create two handy functions to do it for us.
#
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
#
# Convolution and Pooling
#
# TensorFlow also gives us a lot of flexibility in convolution and pooling operations.
# How do we handle the boundaries? What is our stride size?
# In this example, we're always going to choose the vanilla version.
#
# Our convolutions use a stride of one and are zero padded so that the output is the
# same size as the input.
#
# Our pooling is plain old max pooling over 2x2 blocks.
#
# We'll also abstract those operations into functions.
#
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#
# First Convolutional Layer
#
# Our first layer will consist of a convolution, followed by max pooling.
#
# The convolution will compute 32 features for each 5x5 patch.
# Its weight tensor will have a shape of [5, 5, 1, 32].
#
# The first two dimensions are the patch size (5)
# The next is the number of input channels (1)
# The last is the number of output channels (32)
#
# We will also have a bias vector with a component for each output channel.
#
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions
# corresponding to image width and height, and the final dimension corresponding to the
# number of color channels.
x_image = tf.reshape(x, [-1, 28, 28, 1])
# We then convolve x_image with the weight tensor, add the bias, apply the ReLU function,
# and finally max pool.
# The max_pool_2x2 method will reduce the image size to 14x14.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
#
# Second Convolutional Layer
#
# In order to build a deep network, we stack several layers of this type.
#
# The second layer will have 64 features for each 5x5 patch.
# The max_pool_2x2 method will reduce the image size to 7x7.
#
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#
# Densely Connected Layer
#
# Now that the image size has been reduced to 7x7, we add a fully-connected layer
# with 1024 neurons to allow processing on the entire image.
#
# We reshape the tensor from the pooling layer into a batch of vectors,
# multiply by a weight matrix, add a bias, and apply a ReLU.
#
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
#
# Dropout
#
# To reduce overfitting, we will apply dropout before the readout layer.
# We create a placeholder for the probability that a neuron's output is kept during dropout.
# This allows us to turn dropout on during training, and turn it off during testing.
# TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition
# to masking them, so dropout just works without any additional scaling.
#
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#
# Readout Layer
#
# Finally, we add a layer, just like for the one layer softmax regression above.
#
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
#
# Train and evaluate
#
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y_conv))
# We will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer.
# We will add logging to every 100th iteration in the training process.
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
if False: # Don't execute for compiling blog
# We will also use tf.Session rather than tf.InteractiveSession.
# This better separates the process of creating the graph (model sepecification)
# and the process of evaluating the graph (model fitting).
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
# Save the model --------------------------------------------
# Build the signature_def_map.
classification_inputs = tf.saved_model.utils.build_tensor_info(x)
classification_outputs_classes = tf.saved_model.utils.build_tensor_info(y)
classification_outputs_scores = tf.saved_model.utils.build_tensor_info(y_conv)
classification_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={tf.saved_model.signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={tf.saved_model.signature_constants.CLASSIFY_OUTPUT_CLASSES: classification_outputs_classes,
tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES: classification_outputs_scores},
method_name=tf.saved_model.signature_constants.CLASSIFY_METHOD_NAME))
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_x},
outputs={'scores': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
export_path = './export_model'
print('Exporting trained model to: %r' % export_path)
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
'predict_images':
prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
})
builder.save()
# Complete --------------------------------------------
print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
I've disabled the running of the model via the if
flag as it takes to long when I compile my blog but you would see the following:
step 0, training accuracy 0.04
step 100, training accuracy 0.9
step 200, training accuracy 0.94
step 300, training accuracy 0.92
step 400, training accuracy 0.86
step 500, training accuracy 0.94
step 600, training accuracy 0.86
step 700, training accuracy 0.92
step 800, training accuracy 0.94
step 900, training accuracy 0.92
step 1000, training accuracy 0.92
step 1100, training accuracy 0.96
step 1200, training accuracy 0.96
step 1300, training accuracy 0.98
step 1400, training accuracy 0.9
step 1500, training accuracy 0.96
step 1600, training accuracy 0.96
step 1700, training accuracy 1
step 1800, training accuracy 0.94
step 1900, training accuracy 0.96
test accuracy 0.9745
We can see the accuracy increase as we go through 2,000 steps to 97%.
We should reach around 99.2% with 20,000 epochs.
I'm starting to understand how this all comes together, I think I need to get more practical experience with creating models using my own data set.
Saving the model for export¶
I've edited the code above to add some code which saves the model into a pb (protobuffer) format.
You can use tensorflow serving to deploy your model for the web or locally. There's a great notebook and video here:
- https://www.youtube.com/watch?v=T_afaArR0E8
- https://github.com/llSourcell/How-to-Deploy-a-Tensorflow-Model-in-Production/blob/master/demo.ipynb
You can view an iOS implementation example here, note it uses a pb model.
This youtube video explains the process.
Checkout the docs:
Running the model on iOS¶
The next step is to take our pb file and use it on a mobile device, I'm aiming for iOS.
You can do this 2 ways, the first using tensorflow (import it natively and load and run your code) or the second, convert it to a CoreML model and then import it through xCode.
Using the Tensorflow library¶
When using tensorflow you have to write the signature definition of your model yourself, I think you also have to convert it for tensorflow lite.
More info here: https://www.tensorflow.org/mobile/ios_build
Converting to CoreML¶
CoreML allows you to import model in xCode and have it automatically detect the input and output types through the signature provided by the model.
You can convert a model using tfcoreml like this.
import tfcoreml as tf_converter
tf_converter.convert(tf_model_path='export_model/saved_model.pb',
mlmodel_path='export_model/my_model.mlmodel',
output_feature_names=['softmax:0'])
I was following this tutorial for integrating a tensorflow model in an iOS app, but am having a few issues which i'm debugging through now.
Checkout Github for examples of how others have done it, here's a few links I found.