In this practical, we will introduce Tensorflow and use it to:
Build a simple two-layer neural network for digit recognition;
Train and evaluate this neural network.
Setting up Google Colab
As installing and configuring tensorflow on laptop can be a pain, we recommend using Google Colab for this practical. Click here to run this practical on Google Colab, which requires a Google account.
Resource limit of Google Colab under free plan:
Memory: up to 12 GB.
Maximum duration of running a notebook: notebooks can run for at most 12 hours, depending on availability and your usage patterns. (The notebook will die after at most 12 hours)
GPU duration: dynamic, up to a few hours. If you use GPU regularly, runtime durations will become shorter and shorter and disconnections more frequent.
Very Important - we will use the GPU on Google Colab to accelerate the model training. To do this, go to ‘Runtime’ -> ‘Change runtime type’ -> Select ‘T4 GPU’ -> Save. See below.
If you are following along in your own development environment, rather than Colab, see the install guide for setting up TensorFlow for development.
Note: if you are using your own development environment, please make sure you have upgraded to the latest pip before installing TensorFlow 2 package.
Overview of TensorFlow
TensorFlow is an open source library developed by Google for numerical computation. It is particularly well suited for large-scale machine learning.
TensorFlow is based on the construction of computational graphs. It has evolved considerably since it’s open source release in 2015. We will use TF2, which offers many additional features built on top of core features (the most important is tf.keras discussed in later lectures).
Includes a kind of just-in-time (JIT) compiler to optimise speed and memory usage.
Computational graphs can be saved and exported.
Supports autodiff and provides numerous advanced optimisers.
TensorFlow’s Python API
[Credit: Geron]
TensorFlow’s Architecture
[Credit: Geron]
At lowest level TensorFlow is implemented in C++ so that it is highly efficient.
We will focus on the Python TensorFlow and Keras interfaces in this practical. In real-world projects, if you use tensorflow, you will simply interact with the Keras interface but sometimes you might want to use the low-level python API for greater flexibility.
Hardware
One of the factors responsible for the dramatic recent growth of machine learning and AI is advances in computing power. In particular, GPU/TPU hardware that supports high levels of parallelism.
Central Processing Unit (CPU):
General purpose
Low latency
Low throughput
Sequential
Graphics Processing Unit (GPU):
Specialised (for graphics initially)
High latency
High throughput
Parallel execution
Tensor Processing Unit (TPU):
Specialised for matrix operations
High latency
Very high throughput
Extreme parallel execution
In TensorFlow many operations are implemented in low-level kernels, optimised for specific hardware, e.g. CPUs, GPUs, or TPUs.
TensorFlow’s execution engine will ensure operations are run efficiently (across multiple machines and devices if set up accordingly).
Aside: chips optimised for ML and AI are an active area of development
Key events of GPU: - 1999: NVIDIA released GeForce 256 (the first GPU), originally for gaming tasks; - 2007: NVIDIA released CUDA (Computing Unified Device Architecture), a software layer that allows software to use the powerful processing capabilities of GPUs to perform tasks much faster, including AI and gaming; - 2022: OpenAI launched ChatGPT, which was trained on thousands of Nvidia A100 and H100 GPUs.
Google developed TPU in 2016.
Graphcore developed the Intelligence Processing Unit (IPU) in 2016.
Groq developed the Language Processing Unit (LPU).
Set up TensorFlow
Import TensorFlow into your programme to get started:
import numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltprint("TensorFlow version:", tf.__version__)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773415916.249913 3326 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.21.0
Key data type: tensors
TensorFlow API centers around “Tensors” (essentially multi-dimensional arrays of matrices), which are similar to numpy ndarray.
Notes: the section of ‘gradients’ is optional, and you don’t need to understand it. However, we recommend you to read through it as it is key to understanding how neural networks are trained under the hood.
When training neural networks using gradient descent based approaches, we often need to compute the gradients, in particular, the gradient of the cost function with respect to the model weights.
TensorFlow supports automatical differentiation, which allows gradients to be computed automatically. We will compute gradients analytically, numerically and using TensorFlow’s Autodiff functionality at the following point.
Consider this function $ f(w_1, w_2) $ is defined as:
If we compute the gradient analytically, we would need an extra function evaluation for every gradient. Computationally infeasible for many cases, e.g. large neural networks with hundreds of thousands or millions of parameters.
Computing gradients numerically
Compute the gradient by finite differences.
eps =1e-6(f(w1 + eps, w2) - f(w1, w2)) / eps
36.000003007075065
(f(w1, w2 + eps) - f(w1, w2)) / eps
10.000000003174137
Note - the gradients computed numerically are approximate.
Computing gradients with Autodiff
Autodiff builds derivatives of each stage of the computational graph so that gradients can be computed automatically and efficiently.
w1, w2 = tf.Variable(5.), tf.Variable(3.)with tf.GradientTape() as tape: z = f(w1, w2)gradients = tape.gradient(z, [w1, w2])
Computing gradients with Autodiff only requires one computation, regardless of how many derivatives need to be computed. The results do not suffer from any numerical approximations, although it is limited by machine precision arithmetic.
Building a simple neural network using TF
We will build a 2-hidden layers fully connected neural network (a.k.a multilayer perceptron) with TF. This example uses a low-level approach to better understand all mechanics behind building neural networks and the training process.
Neural Network Overview
MNIST Dataset Overview
We will train the neural network to identify MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 255.
In this example, each image will be converted to float32, normalized to [0, 1] and flattened to a 1-D array of 784 features (28*28).
# MNIST dataset parameters.num_classes =10# total classes (0-9 digits).num_feature_one_dimension =28# img shape: 28*28# Training parameters.# learning_rate = 0.1# training_steps = 2000# batch_size = 256# display_step = 100# Network parameters.n_hidden_1 =128# 1st layer number of neurons.n_hidden_2 =256# 2nd layer number of neurons.
Load a dataset
Load and prepare the MNIST dataset. Convert the sample data from integers to floating-point numbers:
mnist = tf.keras.datasets.mnist(x_train, y_train), (x_test, y_test) = mnist.load_data()# normalisation - convert the sample data (range of 1-125) to floating numbersx_train, x_test = x_train /255.0, x_test /255.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
0/11490434━━━━━━━━━━━━━━━━━━━━0s 0s/step
352256/11490434━━━━━━━━━━━━━━━━━━━━1s 0us/step
7651328/11490434━━━━━━━━━━━━━━━━━━━━0s 0us/step
11490434/11490434━━━━━━━━━━━━━━━━━━━━0s 0us/step
# function for visualising digitsdef plot_num(number): item_index = np.where(y_train[:1000]==number) subset=x_train[item_index] egs=5 fig, axs = plt.subplots(1,egs, figsize=(20,10))for i inrange(0,egs): axs[i].imshow(subset[i])for x inrange(0,10): plot_num(x)
Build a machine learning model
Build a tf.keras.Sequential model by stacking layers.
model = tf.keras.models.Sequential([# input layer (28*28), which is flattened before being fed into the neural network tf.keras.layers.Flatten(input_shape=(num_feature_one_dimension, num_feature_one_dimension)),# First fully-connected hidden layer. tf.keras.layers.Dense(n_hidden_1, activation='relu'),# Second fully-connected hidden layer. tf.keras.layers.Dense(n_hidden_2, activation='relu'),# output layer tf.keras.layers.Dense(num_classes)])
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(**kwargs)
For each example, the model returns a vector of logits or log-odds scores, one for each class.
Note: It is possible to bake the tf.nn.softmax function into the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it’s impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.
Define a loss function for training using losses.SparseCategoricalCrossentropy, which takes a vector of logits and a True index and returns a scalar loss for each example.
This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.
This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to -tf.math.log(1/10) ~= 2.3.
loss_fn(y_train[:1], predictions).numpy()
np.float32(2.3809268)
Before you start training, configure and compile the model using Keras Model.compile. Set the optimizer class to adam, set the loss to the loss_fn function you defined earlier, and specify a metric to be evaluated for the model by setting the metrics parameter to accuracy.
Congratulations! You have trained a machine learning model using a prebuilt dataset using the Keras API.
For more examples of using Keras, check out the tutorials. To learn more about building models with Keras, read the guides. If you want learn more about loading and preparing data, see the tutorials on image data loading or CSV data loading.
References and recommendations
Some materials are from Machine Learning with Big Data (SPCE0038) module at UCL.