Practical 5: Neural Networks using TensorFlow

Introduction

In this practical, we will introduce Tensorflow and use it to:

Build a simple two-layer neural network for digit recognition;
Train and evaluate this neural network.

Setting up Google Colab

As installing and configuring tensorflow on laptop can be a pain, we recommend using Google Colab for this practical. Click here to run this practical on Google Colab, which requires a Google account.

Resource limit of Google Colab under free plan:

Memory: up to 12 GB.
Maximum duration of running a notebook: notebooks can run for at most 12 hours, depending on availability and your usage patterns. (The notebook will die after at most 12 hours)
GPU duration: dynamic, up to a few hours. If you use GPU regularly, runtime durations will become shorter and shorter and disconnections more frequent.

Very Important - we will use the GPU on Google Colab to accelerate the model training. To do this, go to ‘Runtime’ -> ‘Change runtime type’ -> Select ‘T4 GPU’ -> Save. See below.

If you are following along in your own development environment, rather than Colab, see the install guide for setting up TensorFlow for development.

Note: if you are using your own development environment, please make sure you have upgraded to the latest pip before installing TensorFlow 2 package.

Overview of TensorFlow

TensorFlow is an open source library developed by Google for numerical computation. It is particularly well suited for large-scale machine learning.

TensorFlow is based on the construction of computational graphs. It has evolved considerably since it’s open source release in 2015. We will use TF2, which offers many additional features built on top of core features (the most important is tf.keras discussed in later lectures).

Features of TF

Similar to numpy but with GPU support.
Supports distributed computing.
Includes a kind of just-in-time (JIT) compiler to optimise speed and memory usage.
Computational graphs can be saved and exported.
Supports autodiff and provides numerous advanced optimisers.

TensorFlow’s Python API

[Credit: Geron]

TensorFlow’s Architecture

[Credit: Geron]

At lowest level TensorFlow is implemented in C++ so that it is highly efficient.

We will focus on the Python TensorFlow and Keras interfaces in this practical. In real-world projects, if you use tensorflow, you will simply interact with the Keras interface but sometimes you might want to use the low-level python API for greater flexibility.

Hardware

One of the factors responsible for the dramatic recent growth of machine learning and AI is advances in computing power. In particular, GPU/TPU hardware that supports high levels of parallelism.

Central Processing Unit (CPU):
- General purpose
- Low latency
- Low throughput
- Sequential
Graphics Processing Unit (GPU):
- Specialised (for graphics initially)
- High latency
- High throughput
- Parallel execution
Tensor Processing Unit (TPU):
- Specialised for matrix operations
- High latency
- Very high throughput
- Extreme parallel execution

In TensorFlow many operations are implemented in low-level kernels, optimised for specific hardware, e.g. CPUs, GPUs, or TPUs.

TensorFlow’s execution engine will ensure operations are run efficiently (across multiple machines and devices if set up accordingly).

Aside: chips optimised for ML and AI are an active area of development

Key events of GPU: - 1999: NVIDIA released GeForce 256 (the first GPU), originally for gaming tasks; - 2007: NVIDIA released CUDA (Computing Unified Device Architecture), a software layer that allows software to use the powerful processing capabilities of GPUs to perform tasks much faster, including AI and gaming; - 2022: OpenAI launched ChatGPT, which was trained on thousands of Nvidia A100 and H100 GPUs.

Google developed TPU in 2016.

Graphcore developed the Intelligence Processing Unit (IPU) in 2016.

Groq developed the Language Processing Unit (LPU).

Set up TensorFlow

Import TensorFlow into your programme to get started:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

print("TensorFlow version:", tf.__version__)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773415916.249913    3326 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

TensorFlow version: 2.21.0

Key data type: tensors

TensorFlow API centers around “Tensors” (essentially multi-dimensional arrays of matrices), which are similar to numpy ndarray.

Tensors

Can construct constant tensors with tf.constant.

tf.constant([[1., 2., 3.], [4., 5., 6.]]) # 2x3 matrix

E0000 00:00:1773415918.589939    3326 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

tf.constant(42) # scalar

<tf.Tensor: shape=(), dtype=int32, numpy=42>

Tensors have a shape and data type (dtype).

t = tf.constant([[1., 2., 3.], [4., 5., 6.]])
t.shape

TensorShape([2, 3])

t.dtype

tf.float32

Indexing

Tensor indexing is very similar to numpy.

t[:, 1:]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

t[..., 1,  tf.newaxis]

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

Operations

Variety of tensor operations are possible.

t + 10

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

# square
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

# matrix multiplication
t @ tf.transpose(t)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

Tensors and numpy ndarray

Tensors and ndarray are highly compatible, and we can create a tensor from ndarray.

For the difference between tensors and ndarrays, see tensors documentation and the geeksforgeeks post. I found the latter one more interesting and easier to understand.

a = np.array([2., 4., 5.])
tf.constant(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

Can convert tensor to ndarray.

t.numpy()

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

Can apply numpy operations to tensors and vice versa.

np.array(t)

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

tf.square(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

np.square(t)

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

Gradients

Notes: the section of ‘gradients’ is optional, and you don’t need to understand it. However, we recommend you to read through it as it is key to understanding how neural networks are trained under the hood.

When training neural networks using gradient descent based approaches, we often need to compute the gradients, in particular, the gradient of the cost function with respect to the model weights.

TensorFlow supports automatical differentiation, which allows gradients to be computed automatically. We will compute gradients analytically, numerically and using TensorFlow’s Autodiff functionality at the following point.

Consider this function $ f(w_1, w_2) $ is defined as:

\[ f(w_1, w_2) = 3w_1^2 + 2w_1w_2 \]

def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

# example of w1 and w2
w1, w2 = 5.0, 3.0

Computing gradients analytically

def df_dw1(w1, w2):
    return 6 * w1 + 2 * w2
def df_dw2(w1, w2):
    return 2 * w1

df_dw1(w1, w2)

36.0

df_dw2(w1, w2)

10.0

If we compute the gradient analytically, we would need an extra function evaluation for every gradient. Computationally infeasible for many cases, e.g. large neural networks with hundreds of thousands or millions of parameters.

Computing gradients numerically

Compute the gradient by finite differences.

eps = 1e-6
(f(w1 + eps, w2) - f(w1, w2)) / eps

36.000003007075065

(f(w1, w2 + eps) - f(w1, w2)) / eps

10.000000003174137

Note - the gradients computed numerically are approximate.

Computing gradients with Autodiff

Autodiff builds derivatives of each stage of the computational graph so that gradients can be computed automatically and efficiently.

w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

Computing gradients with Autodiff only requires one computation, regardless of how many derivatives need to be computed. The results do not suffer from any numerical approximations, although it is limited by machine precision arithmetic.

Building a simple neural network using TF

We will build a 2-hidden layers fully connected neural network (a.k.a multilayer perceptron) with TF. This example uses a low-level approach to better understand all mechanics behind building neural networks and the training process.

Neural Network Overview

MNIST Dataset Overview

We will train the neural network to identify MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 255.

In this example, each image will be converted to float32, normalized to [0, 1] and flattened to a 1-D array of 784 features (28*28).

mnist

More info on MNIST can be found here.

# MNIST dataset parameters.
num_classes = 10 # total classes (0-9 digits).
num_feature_one_dimension = 28 # img shape: 28*28

# Training parameters.
# learning_rate = 0.1
# training_steps = 2000
# batch_size = 256
# display_step = 100

# Network parameters.
n_hidden_1 = 128 # 1st layer number of neurons.
n_hidden_2 = 256 # 2nd layer number of neurons.

Load a dataset

Load and prepare the MNIST dataset. Convert the sample data from integers to floating-point numbers:

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
# normalisation - convert the sample data (range of 1-125) to floating numbers
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


       0/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0s/step

  352256/11490434 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step

 7651328/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

# function for visualising digits
def plot_num(number):

  item_index = np.where(y_train[:1000]==number)
  subset=x_train[item_index]

  egs=5
  fig, axs = plt.subplots(1,egs, figsize=(20,10))

  for i in range(0,egs):
    axs[i].imshow(subset[i])

for x in range(0,10):
  plot_num(x)

Build a machine learning model

Build a tf.keras.Sequential model by stacking layers.

model = tf.keras.models.Sequential([
  # input layer (28*28), which is flattened before being fed into the neural network
  tf.keras.layers.Flatten(input_shape=(num_feature_one_dimension, num_feature_one_dimension)),
  # First fully-connected hidden layer.
  tf.keras.layers.Dense(n_hidden_1, activation='relu'),
  # Second fully-connected hidden layer.
  tf.keras.layers.Dense(n_hidden_2, activation='relu'),
  # output layer
  tf.keras.layers.Dense(num_classes)
])

/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)

For each example, the model returns a vector of logits or log-odds scores, one for each class.

predictions = model(x_train[:1]).numpy()
predictions

array([[ 0.23746307,  0.13589239,  0.24433313,  0.3890145 , -0.24540573,
        -0.01380943, -0.39836118, -0.37099135,  0.31111696,  0.00561234]],
      dtype=float32)

The tf.nn.softmax function converts these logits to probabilities for each class:

tf.nn.softmax(predictions).numpy()

array([[0.11887838, 0.10739679, 0.11969788, 0.13833141, 0.07334924,
        0.09246483, 0.06294596, 0.06469258, 0.12796475, 0.09427822]],
      dtype=float32)

Note: It is possible to bake the tf.nn.softmax function into the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it’s impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.

Define a loss function for training using losses.SparseCategoricalCrossentropy, which takes a vector of logits and a True index and returns a scalar loss for each example.

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to -tf.math.log(1/10) ~= 2.3.

loss_fn(y_train[:1], predictions).numpy()

np.float32(2.3809268)

Before you start training, configure and compile the model using Keras Model.compile. Set the optimizer class to adam, set the loss to the loss_fn function you defined earlier, and specify a metric to be evaluated for the model by setting the metrics parameter to accuracy.

model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

Train and evaluate your model

Use the Model.fit method to adjust your model parameters and minimise the loss.

The Model.fit method trains the model for a fixed number of epochs (iterations on a dataset). In this example, we will train the model for 10 epochs.

In the Model.fit method, the verbose parameter controls how much output is printed during training:

verbose=1: Default. An animated progress bar for every single epoch (very noisy).
verbose=0: Silent. No output.
verbose=2: One line per epoch; shows the final stats for the epoch once it’s finished; much cleaner.

model.fit(x_train, y_train, epochs=10, verbose=2)

Epoch 1/10
1875/1875 - 4s - 2ms/step - accuracy: 0.9328 - loss: 0.2229
Epoch 2/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9725 - loss: 0.0914
Epoch 3/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9785 - loss: 0.0661
Epoch 4/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9846 - loss: 0.0483
Epoch 5/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9869 - loss: 0.0406
Epoch 6/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9889 - loss: 0.0328
Epoch 7/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9910 - loss: 0.0278
Epoch 8/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9919 - loss: 0.0233
Epoch 9/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9926 - loss: 0.0221
Epoch 10/10
1875/1875 - 3s - 2ms/step - accuracy: 0.9941 - loss: 0.0179

<keras.src.callbacks.history.History at 0x7f3caf86a110>

The Model.evaluate method checks the models performance, usually on a Validation set or Test set.

model.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - 1ms/step - accuracy: 0.9771 - loss: 0.1007

[0.1006733700633049, 0.9771000146865845]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the TensorFlow tutorials.

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:

probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

To check a random data point and prediction:

#probability_model(x_test[:1])
predictions=probability_model.predict(x_test)

index=20

print(np.argmax(predictions[index]))
plt.imshow(x_test[index])

  1/313 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step

 63/313 ━━━━━━━━━━━━━━━━━━━━ 0s 808us/step

129/313 ━━━━━━━━━━━━━━━━━━━━ 0s 783us/step

194/313 ━━━━━━━━━━━━━━━━━━━━ 0s 782us/step

261/313 ━━━━━━━━━━━━━━━━━━━━ 0s 775us/step

313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 843us/step

313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 876us/step

9

Conclusion

Congratulations! You have trained a machine learning model using a prebuilt dataset using the Keras API.

For more examples of using Keras, check out the tutorials. To learn more about building models with Keras, read the guides. If you want learn more about loading and preparing data, see the tutorials on image data loading or CSV data loading.

References and recommendations

Some materials are from Machine Learning with Big Data (SPCE0038) module at UCL.