1. Home

in

1. Make a Prediction¶

Overview
• Data parsing and processing using data frames
• Splitting our data into a training and testing set
• Simple linear regression modal to find our line of best fit
• Slope formula - $y = mx + b$

Introduction¶

What is deep learning?¶

A neural network is a machine learning model, when we create neural networks that are many layers deep, that’s Deep Learning.

Deep learning is a subset of machine learning that’s outperformed every other type of model.

3 different styles of learning¶

Supervised

Labelled data - Many images of labelled cars, to create a model to label cars (using a predefined set to test against and work towards).

Chess example: Get’s feedback after every move.

Unsupervised

Unlabelled data - Data set with no labels, it gets no feedback as to what’s right or wrong. It has to learn the structure of the data to solve the task.

Chess example: Never get’s feedback - even if it won.

Reinforcement

Interacting with environment. Isn’t given feedback, only gets it when it achieves it’s goal. (When a chess game finishes).

Chess example: Only gets feedback if it won the entire game.

• This task uses a supervised approach (with labelled data)
• The type of machine learning task we’ll perform is called regression.

Create a new virtualenv and switch to it.

# create a new virtual environment
mkvirtualenv deeplearning

# switch to it
workon deeplearning

# install dependencies - we must use version 1 of matplotlib
pip install -U numpy pandas scikit-learn scipy 'matplotlib==1.5.1'

Sample brain body data
Brain Body
3.385 44.500
0.480 15.500
1.350 8.100
465.000 423.000

Here’s my jupyter notebook for this lesson.

In [1]:
import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")

In [2]:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt

# Read our data into a pandas data-frame object
# which is a 2d data structure of rows and columns
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]

# linear regression helps find the relationship between our 2 vars to find the only line of best fit
# Use scikits linear_model object to init our linear regression model
body_reg = linear_model.LinearRegression()

# Fit our model on our x,y value pairs
body_reg.fit(x_values, y_values)

# Now we have the line of best fit, we can plot it on a graph

# Scatter plot
plt.scatter(x_values, y_values)

# Plot our regression line
# "for ever x value we have, predict the associated y value
#  and draw a line that intersects all those points"
plt.plot(x_values, body_reg.predict(x_values))

# Save it to a file
# plt.savefig('best-fit.png', bbox_inches='tight')

# Show it
plt.show()


The x axis represents brain weights.

The y axis represents body weights.

Challenge dataset¶

In [1]:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
from IPython.display import display


Data cleaning¶

Our data is in a different format, we don't have heading columns, so we need to clean it up.

We also split our data into a training and testing set.

In [2]:
global df

# Read our data into a pandas data-frame object

# Split our data into a test and training set (so we can calculate the error)
# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = np.asarray(train_test_split(df['X'], df['Y'], test_size=0.1))

"Dataframe length: %s, X_train: %s, X_test: %s, y_train: %s, y_test: %s" % (len(df), len(X_train), len(X_test), len(y_train), len(y_test))

Out[2]:
'Dataframe length: 97, X_train: 87, X_test: 10, y_train: 87, y_test: 10'

X_train, X_test, y_train, y_test are all of type pandas.core.series.Series.

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

Create our model¶

Create a LinearRegression model and fit our line.

We need to reshape our training data from (87,) to (87,1)

# (87,)
[1,2,3,86,87]

# (87,1)
[[1],[2],[3],[86],[87]]

# np.array([1,2,3,86,87]).shape           # (5,)
# np.array([[1],[2],[3],[86],[87]]).shape # (5,1)


http://cs231n.github.io/python-numpy-tutorial/#array-indexing

In [3]:
# linear regression helps find the relationship between our 2 vars to find the only line of best fit
# Use scikits linear_model object to init our linear regression model
reg = linear_model.LinearRegression()

# Fit our model on our x,y value pairs
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.reshape.html
x = reg.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))

# Returns the mean accuracy on the given test data and labels.
score = reg.score(X_test.values.reshape(-1, 1), y_test.values.reshape(-1, 1))

score

Out[3]:
0.80637877650532719
In [4]:
# Now we have the line of best fit, we can plot it on a graph

# Create our horizontal axis ranging from 5-25
x_line = np.arange(5, 25).reshape(-1,1)

# Scatter plot
plt.scatter(df['X'], df['Y'])

# Plot our regression line
# "for ever x value we have, predict the associated y value
#  and draw a line that intersects all those points"
plt.plot(x_line, reg.predict(x_line))

# Show it
plt.show()


Slope formula example¶

Here's a simple example using a small dataset showing the slope formula.

$$y = mx + b$$

This is a table of salary and years worked.

In [5]:
salary_data = {
'year': [1,2,3,4,5,6,7,8,9,10],
'salary': [40000,42000,43000,46000,48000,52000,58000,62000,65500,68000]
}
df = pd.DataFrame(data=salary_data)
df

Out[5]:
salary year
0 40000 1
1 42000 2
2 43000 3
3 46000 4
4 48000 5
5 52000 6
6 58000 7
7 62000 8
8 65500 9
9 68000 10
In [6]:
# No train/test splitting as we only have 10 values
X_train = df['year'].values.reshape(-1, 1)
y_train = df['salary'].values.reshape(-1, 1)

# Create our linear model
reg = linear_model.LinearRegression()

# Fit our model on our x,y value pairs
x = reg.fit(X_train, y_train)

b = int(round(reg.intercept_[0]))
m = int(round(reg.coef_[0][0]))

"y = {}x + {}".format(m, b)

Out[6]:
'y = 3342x + 34067'

The above should show:

$$y = 3342x + 34067$$

So if we were prediciting the salary for someone after 15 years, the formula would be.

$$y = (3342*15) + 34067$$

$$y = 84197$$

2. Linear Regression using Gradient Descent¶

Overview
• Hyperparameters
• Sum of squared errors
• Local minima
• Partial derivative

We use gradient descent to find the line of best fit in our data, which we can then use to make a prediction. The gradient is a direction of positive or negative which we update our current b and current m values with each time step until we find our local minima (lowest error).

Linear regression is plain machine learning, there is no neural network.

Linear regression using gradient descent is used everywhere in ML and DL, so it’s important to understand the core concept.

Linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.

The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.

Wikipeda#Linear_regression

Hyperparameters¶

In the context of machine learning, hyperparameters are parameters whose values are set prior to the commencement of the learning process. By contrast, the values of other parameters are derived via training.

Wikipeda#Hyperparameter_(machine_learning)

Examples of hyper-parameters include:

• Learning rate (can be dynamic)
• Number of hidden layers in a deep neural network

Local Minima¶

If we find the smallest error rate (our local minima), that’ll also give us the y-intercept and slope.

We can use these in our $y=mx+b$ equation to get our line of best fit, which we can then use to make a prediction.

The way we get the smallest error, is by calculating the gradient.

The gradient gives us the direction to move the slope. It tells us to move positive (up) or negative (down), after completing our iterations, we should find our local minima - the point with the lowest errors.

If a function’s gradient can be computed using the partial derivative, it’s known as a differentiable function, and we can optimize it.

The gradient is a tangent line (a line that touches the function at one point only)

To calculate the gradient, we have compute the partial derivative with respect to our values b & m

Derivatives and partial derivatives¶

Calculate the partial derivative m:

$\frac \partial{\partial m} = \frac 2N\sum_{i=1}^n -x_i(y_i - (mx_i + b))$

Calculate the partial derivative of b:

$\frac \partial{\partial b} = \frac 2N\sum_{i=1}^n -(y_i - (mx_i + b))$

Sum of squared errors¶

We want to compute the error/loss for the current time step (error/loss means the same thing), we use the sum of squared errors to do this.

3. Make a neural network¶

Overview
• The McCulloch-Pitts Model of Neuron (1943)
• The Perceptron
• Forward propagation
• Backpropagate to update weights

Introduction¶

Let’s create our own neural network from scratch using just numpy.

Lesson #3 - Create a neural network¶

In [1]:
import numpy as np
from numpy import exp, array, random, dot
import matplotlib.pyplot as plt

In [2]:
import numpy as np

# Custom logger to log one-line arrays
def logger(msg, log_array, nl=False):
if not isinstance(log_array, np.ndarray):
print '{message}'.format(message=msg)
else:
string_array = ', '.join(map(str, log_array))
print '{message: <{fill}}: {array}'.format(message=msg, array=string_array, fill='35')

if nl is True:
print('\n')

In [3]:
class NeuralNetwork():
def __init__(self):
logger('Init Network', False)

# Seed the random generator so we get the same number each time
np.random.seed(1)

# We model a single neuron with 3 input connections 1 output connection
# we assign random weights to a 3 x 1 matrix, with values in the range -1 to 1
# and a mean of 0
self.synaptic_weights = 2 * np.random.random((3,1)) -1
logger('Starting random synaptic weights', self.synaptic_weights, True)

# The Sigmoid function, which describes an S shaped curve.
# We pass the weighted sum of the inputs through this function to
# normalise them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + np.exp(-x))

# The derivative of the Sigmoid function.
# This is the gradient of the Sigmoid curve.
# It indicates how confident we are about the existing weight.
def __sigmoid_derivative(self, x):
return x * (1-x)

# We train the neural network through a process of trial and error.
# Adjusting the synaptic weights each time.
def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
for iteration in xrange(number_of_training_iterations):
# Pass the training set through our neural network (a single neuron).
output = self.predict(training_set_inputs)

if (iteration % 200 == 0):
logger('Iteration: %s' % iteration, False)
logger('Output', output)

# Calculate the error (The difference between the desired output
# and the predicted output).
error = training_set_outputs - output

# Multiply the error by the input and again by the gradient of the Sigmoid curve.
# This means less confident weights are adjusted more.
# This means inputs, which are zero, do not cause changes to the weights.
adjustment = np.dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))

if (iteration % 200 == 0):
logger('Error', error)

def predict(self, inputs):
# Pass inputs through our neural network (our single neuron).
return self.__sigmoid(np.dot(inputs, self.synaptic_weights))

if __name__ == '__main__':

# Initialise our single neuron neural network
neural_network = NeuralNetwork()

# The training set
# We have 4 examples, each consisting of 3 input values and 1 output value
training_set_inputs = np.array([[0,0,1], [1,1,1], [1,0,1], [0,1,1]])
training_set_outputs = np.array([[0,1,1,0]]).T

logger('Training set inputs :', training_set_inputs)
logger('Training set outputs:', training_set_outputs, True)

# Train the network using a training set
# Do it 10,000 times and make small adjustments each time
neural_network.train(training_set_inputs, training_set_outputs, 1001)

logger('Synaptic weights after training', neural_network.synaptic_weights)

# Test the network with a new situation
logger('Making a prediction with [1, 0, 0]', neural_network.predict(np.array([1, 0, 0])), True)

logger('[Complete]', False)

Init Network
Starting random synaptic weights   : [-0.16595599], [ 0.44064899], [-0.99977125]

Training set inputs :              : [0 0 1], [1 1 1], [1 0 1], [0 1 1]
Training set outputs:              : [0], [1], [1], [0]

Iteration: 0
Output                             : [ 0.2689864], [ 0.3262757], [ 0.23762817], [ 0.36375058]
Error                              : [-0.2689864], [ 0.6737243], [ 0.76237183], [-0.36375058]
Adjustment                         : [ 0.28621005], [ 0.06391297], [ 0.14913351]

Iteration: 200
Output                             : [ 0.07532702], [ 0.93886839], [ 0.95022027], [ 0.06151182]
Error                              : [-0.07532702], [ 0.06113161], [ 0.04977973], [-0.06151182]
Adjustment                         : [ 0.00586329], [ -4.23410172e-05], [-0.00293442]

Iteration: 400
Output                             : [ 0.05175173], [ 0.9579657], [ 0.96596069], [ 0.04198947]
Error                              : [-0.05175173], [ 0.0420343], [ 0.03403931], [-0.04198947]
Adjustment                         : [ 0.00281185], [  3.52947802e-06], [-0.00141687]

Iteration: 600
Output                             : [ 0.0416479], [ 0.96615046], [ 0.97260506], [ 0.03375824]
Error                              : [-0.0416479], [ 0.03384954], [ 0.02739494], [-0.03375824]
Adjustment                         : [ 0.00183693], [  5.85956349e-06], [-0.00092652]

Iteration: 800
Output                             : [ 0.03574306], [ 0.97093628], [ 0.97647479], [ 0.02896968]
Error                              : [-0.03574306], [ 0.02906372], [ 0.02352521], [-0.02896968]
Adjustment                         : [ 0.00136057], [  5.22015424e-06], [-0.00068627]

Iteration: 1000
Output                             : [ 0.03176745], [ 0.97416005], [ 0.97907779], [ 0.02575143]
Error                              : [-0.03176745], [ 0.02583995], [ 0.02092221], [-0.02575143]
Adjustment                         : [ 0.00107903], [  4.39031308e-06], [-0.00054414]

Synaptic weights after training    : [ 7.26390912], [-0.21614179], [-3.41757429]
Making a prediction with [1, 0, 0] : 0.999300125303

[Complete]


Lesson #3 - Challenge¶

We need to modify our existing code to add 2 new layers to it.

We want to forward propagate our inputs, and back propagate our errors through our network.

Checkout Andrew NG’s course to learn all the math behind this, this is what I’m currently working through.

Deep learning glossary¶

Machine Learning¶

Supervised

Labelled data - Many images of labelled cars, to create a model to label cars (using a predefined set to test against and work towards).

Get’s feedback after every move.

Unsupervised

Unlabelled data - Data set with no labels, it gets no feedback as to what’s right or wrong. It has to learn the structure of the data to solve the task.

Never get’s feedback - even if it won.

Reinforcement

Interacting with environment, only get feedback when it achieves it’s goal (a game finishes).

Only gets feedback if it won the game, or scored points.

Perceptron

Perceptron’s fit a linear separable line, if the data is non-linearly separable this approach will fail and the learning algorithm will not converge. This is a simple supervised linear model and used in neural networks. It updates weights via a simple learning algorithm.

• Start with a low weight $w$
• Choose a point (randomly to start) $i$
• Update w based on $y$ (output), $x$ (input) and the previous weight
• Check the error, update the weights accordingly

An early classifier, neural networks today use more powerful classification

Neural network

A neural network has an input layer, multiple hidden layers, and an output layer.

Each node in the hidden or output layer has it’s own classifier, it passes it’s data to the next hidden layer until eventually it reaches the final output layer where the results are determined by the scores for each node.

Neural networks were born out of the need to address the inaccuracies of the perceptron, by using a layered web of perceptron’s the accuracy of predictions could be improved, this is also known as an Multi layered perceptron (MLP).

Feed forward neural network

Signals flow in one direction, from input to output, one layer at a time.

“The Perceptron by Frank Rosenblatt” - is a feed forward forward neural network.

Linear regression

Models relationship between independent & dependent variables to find the line of best fit.

Linear_regression

Forward Propagation
The series of events starting from the input where the activation is sent to the next layer until it reaches the output is know as forward propagation.
Back Propagation

Also known as back prop, this is the process of back tracking errors through the weights of the network after forward propagating inputs through the network.

This is used by applying the chain rule in calculus. (Source)

Sigmoid

A function used to activate weights in our network in the interval of [0, 1]. This function graphed out looks like an ‘S’ which is where this function gets is name, the s is sigma in greek. Also known as the logistic function.

Cost
$Cost = GeneratedOutput - ActualOutput$

A gradient describes a slope, a direction we’re moving to reduce our error rate, it’s either positive or negative.

The gradient is the partial derivative of a function that takes in multiple vectors and outputs a single value (i.e. our cost functions in Neural Networks). The gradient tells us which direction to go on the graph to increase our output if we increase our variable input.

We use the gradient and go in the opposite direction since we want to decrease our loss.

The process when we descend our gradient to approach a zero error rate and update our weight values iteratively.
Normalization

Normalization is the process of normalizing our data to operate on a scale relative to the original set of values, this can allow our model to converge faster as all the values operate on the same scale.

A popular normalization function is min max scaling.

$z = \frac {x-min(x)}{max(x)-min(x)}$
Hyperparameters

Hyperparameters are parameters whose values are set prior to the commencement of the learning process. By contrast, the values of other parameters are derived via training.

Examples include batch size, learning rate, number of iterations, weight decay.

Time step
A training step is one gradient update. In one step batch_size many examples are processed.
Epoch
An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of 2,000 images / (10 images / step) = 200 steps.
Weights
Weights are the probabilities that affect how data flows in the graph, they will be updated continuously during training so our results get closer to the result with each iteration.
Bias
The bias lets us shift our regression line to better fit the data.
Dot Product

When we multiply 2 matrices together, like applying weight values to input data. The resulting scalar value is the dot product.

If a matrix is returned, it’s called a cross product.

Local Minima

The local minima is the point where the error rate is the lowest, finding the local minima will also give us the y-intercept and slope.

https://en.wikipedia.org/wiki/Local_optimum

YOLO
https://www.ted.com/talks/joseph_redmon_how_a_computer_learns_to_recognize_objects_instantly

Algebra¶

Slope Formula

$y = mx + b$

todo - improve this

b is the y’s intercept and m measures how steep.

We’re looking for a low error (also known as loss).

Tangent Line

A tangent line is a straight line that touches a function at only one point.

The tangent line represents the instantaneous rate of change of the function at that one point. The slope of the tangent line at a point on the function is equal to the derivative of the function at the same point.

Secant line

A secant line is a straight line joining two points on a function. It is also equivalent to the average rate of change, or simply the slope between two points.

Linear Algebra¶

Linear algebra, the study of the properties of vector spaces and matrices.

Scalars

A single number

$x$

Vectors

A 1 dimensional array of numbers

Row

$\begin{bmatrix}a & b\end{bmatrix}$

Column

$\begin{bmatrix}a\\b\end{bmatrix}$

Both have a dimension of [2].

Matrix

An n dimensional array of numbers

Rows and columns, each one is a vector

$\begin{bmatrix}a & b & c\\x & y & z\end{bmatrix}$

This has a dimension of [2,3].

Tensor

A tensor is a multi-dimensional array.

A first order tensor is a vector. A second order tensor is a matrix. Tensors of order three or higher are called a higher order tensors.

Multivariable Calculus¶

Calculus is a branch of mathematics that studies change. It focuses on limits, functions, derivatives, integrals and infinite series.

It is used to solve mathematical problems that cannot be solved by algebra and helps in determining the rate a variable will change in relation to others.

Calculus has two major branches, differential and integral.