Deep Learning From Scratch VI: TensorFlow

Tutorial · 9 Nov 2017

This is part 6 of a series of tutorials, in which we develop the mathematical and algorithmic underpinnings of deep neural networks from scratch and implement our own neural network library in Python, mimicking the TensorFlow API. Start with the first part: I: Computational Graphs.

TensorFlow

It is now time to say goodbye to our own toy library and start to get professional by switching to the actual TensorFlow.

As we’ve learned already, TensorFlow conceptually works exactly the same as our implementation. So why not just stick to our own implementation? There are a couple of reasons:

TensorFlow is the product of years of effort in providing efficient implementations for all the algorithms relevant to our purposes. Fortunately, there are experts at Google whose everyday job is to optimize these implementations. We do not need to know all of these details. We only have to know what the algorithms do conceptually (which we do now) and how to call them.
TensorFlow allows us to train our neural networks on the GPU (graphical processing unit), resulting in an enormous speedup through massive parallelization.
Google is now building Tensor processing units, which are integrated circuits specifically built to run and train TensorFlow graphs, resulting in yet more enormous speedup.
TensorFlow comes pre-equipped with a lot of neural network architectures that would be cumbersome to build on our own.
TensorFlow comes with a high-level API called Keras that allows us to build neural network architectures way easier than by defining the computational graph by hand, as we did up until now.

Update 2019: The new TensorFlow 2 has an updated API. If you would like to migrate the example to TensorFlow 2, you can find information on how to do that here: https://www.tensorflow.org/guide/migrate. The text below assumes the outdated TensorFlow 1 API which is highly analogous to the API we built. Perhaps trying to migrate this example code to TensorFlow 2 is precisely the exercise that will get you started with TensorFlow 2.

So let’s get started. Installing TensorFlow is very easy.

pip install tensorflow

If we want GPU acceleration, we have to install the package tensorflow-gpu:

pip install tensorflow-gpu

In our code, we import it as follows:

import tensorflow as tf

Since the syntax we are used to from the previous sections mimics the TensorFlow syntax, we already know how to use TensorFlow. We only have to make the following changes:

Add tf. to the front of all our function calls and classes
Call session.run(tf.global_variables_initializer()) after building the graph

The rest is exactly the same. Let’s recreate the multi-layer perceptron from the previous section using TensorFlow:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Create two clusters of red points centered at (0, 0) and (1, 1), respectively.
red_points = np.concatenate((
    0.2*np.random.randn(25, 2) + np.array([[0, 0]]*25),
    0.2*np.random.randn(25, 2) + np.array([[1, 1]]*25)
))

# Create two clusters of blue points centered at (0, 1) and (1, 0), respectively.
blue_points = np.concatenate((
    0.2*np.random.randn(25, 2) + np.array([[0, 1]]*25),
    0.2*np.random.randn(25, 2) + np.array([[1, 0]]*25)
))

# Create training input placeholder
X = tf.placeholder(dtype=tf.float64)

# Create placeholder for the training classes
c = tf.placeholder(dtype=tf.float64)

# Build a hidden layer
W_hidden = tf.Variable(np.random.randn(2, 2))
b_hidden = tf.Variable(np.random.randn(2))
p_hidden = tf.sigmoid( tf.add(tf.matmul(X, W_hidden), b_hidden) )

# Build the output layer
W_output = tf.Variable(np.random.randn(2, 2))
b_output = tf.Variable(np.random.randn(2))
p_output = tf.nn.softmax( tf.add(tf.matmul(p_hidden, W_output), b_output) )

# Build cross-entropy loss
J = tf.negative(tf.reduce_sum(tf.reduce_sum(tf.multiply(c, tf.log(p_output)), axis=1)))

# Build minimization op
minimization_op = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(J)

# Build placeholder inputs
feed_dict = {
    X: np.concatenate((blue_points, red_points)),
    c:
        [[1, 0]] * len(blue_points)
        + [[0, 1]] * len(red_points)

}

# Create session
session = tf.Session()

# Initialize variables
session.run(tf.global_variables_initializer())

# Perform 1000 gradient descent steps
for step in range(1000):
    J_value = session.run(J, feed_dict)
    if step % 100 == 0:
        print("Step:", step, " Loss:", J_value)
    session.run(minimization_op, feed_dict)

# Print final result
W_hidden_value = session.run(W_hidden)
print("Hidden layer weight matrix:\n", W_hidden_value)
b_hidden_value = session.run(b_hidden)
print("Hidden layer bias:\n", b_hidden_value)
W_output_value = session.run(W_output)
print("Output layer weight matrix:\n", W_output_value)
b_output_value = session.run(b_output)
print("Output layer bias:\n", b_output_value)

# Visualize classification boundary
xs = np.linspace(-2, 2)
ys = np.linspace(-2, 2)
pred_classes = []
for x in xs:
    for y in ys:
        pred_class = session.run(p_output,
                              feed_dict={X: [[x, y]]})[0]
        pred_classes.append((x, y, pred_class.argmax()))
xs_p, ys_p = [], []
xs_n, ys_n = [], []
for x, y, c in pred_classes:
    if c == 0:
        xs_n.append(x)
        ys_n.append(y)
    else:
        xs_p.append(x)
        ys_p.append(y)
plt.plot(xs_p, ys_p, 'ro', xs_n, ys_n, 'bo')