Beatrice Taylor - beatrice.taylor@ucl.ac.uk
8th October 2025
Looked at hypothesis testing:
Maths underpins quantitative methods
Image credit: [xkcd](https://xkcd.com/1838/)
By the end of this lecture you should:
The goal is to understand equations like this:
\[\begin{align} y = \sum_{i=1}^n \beta_i x_i \end{align}\]Equations are often used in the methods sections of papers to describe the model.
Taken from: Chiou, Jou, & Yang, (2015). Factors affecting public transportation usage rate: Geographically weighted regression. Transportation Research Part A: Policy and Practice.
A model which improves after data is taken into account.
Going to be using some mathematical notation
It’s just a formal way of writing maths.
Mathematical notation cheat sheet: https://www.upyesp.org/posts/makrdown-vscode-math-notation/
There are mathematical conventions for how we describe different things.
The power to represent any number!
Summation notation is a compact way to write repeated addition.
\[\begin{align} \sum_{i=1}^n a_i = a_1 + a_2 + a_3 + \dots + a_n \end{align}\]Example:
\[\begin{align} \sum_{i=1}^5 i = 1+2+3+4+5 = 15 \end{align}\]Product notation is a compact way to write repeated multiplication.
\[\begin{align} \prod_{i=1}^n a_i = a_1 \cdot a_2 \cdot a_3 \cdot \dots \cdot a_n \end{align}\]Example:
\[\begin{align} \prod_{i=1}^4 i = 1 \cdot 2 \cdot 3 \cdot 4 = 24 \end{align}\]\(\epsilon\) is used to mean a small, but arbitrary, number.
Example:
\[\begin{align} y = 2x + \epsilon \end{align}\]This means \(y\) is equal to \(2\) times \(x\) plus a small value. So if \(x=3\), then we would expect \(y\) to be close to \(6\), but not exactly \(6\).
A function is a mathematical operation which maps an input value to an output value.
Mathematical description of a function
\[\begin{align} f(x) = y \end{align}\]Maps values from a domain \(X\) to a range \(Y\).
\[\begin{align} f(x) = y \text{ for } x \in X, y \in Y \end{align}\]Domain - the set of all possible input numbers for the function
Example:
In \(f(x)=y\), \(x\) is the domain.
Range: the set of all possible output numbers from the function
Example:
In \(f(x)=y\), \(y\) is the range.
In the applied sciences the domain and range are typically \(\mathbb N\) or \(\mathbb Z\) or \(\mathbb R\)
Algebra is a way of expressing numbers in a generalised or abstract form.
Example:
\[\begin{align} x \in \mathbb N \end{align}\]Probability density function of normal distribution
\[\begin{align} f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \end{align}\]where \(x \in \mathbb N\) and \(f(x) \in [ 0 , 1 ]\).
Note
\([0, 1]\) is the set of real numbers between \(0\) and \(1\), inclusive of \(0\) and \(1\).The other function we’ve seen is a linear equation.
\[\begin{align} f(x) = ax + b \end{align}\]A linear equation is a linear combination of variables.
Examples include:
\[\begin{align} f(x) = ax + b \end{align}\]Graphically linear equations are straight lines.
We can generalise to multiple equations.
They are:
There are many solutions!!!
Many solutions = under-specified
In school might have solved this using substitution.
There is exactly one solution!
Very hard to solve!
Can be written as:
\[\begin{align} \begin{bmatrix}1&1&1&1\cr1&4&1&1\cr1&4&43&1\cr1&4&7&59\end{bmatrix}\begin{pmatrix}x_1\cr x_2\cr x_3\cr x_4\end{pmatrix} = \begin{pmatrix}10\cr 25\cr 37\cr 1073\end{pmatrix} \end{align}\]The generalised matrix form (for a 4x4 matrix is):
\[\begin{align} \begin{bmatrix}a_{1,1} & a_{1,2} & a_{1,3} & a_{1,4}\cr a_{2,1} & a_{2,2} & a_{2,3} & a_{2,4}\cr a_{3,1} & a_{3,2} & a_{3,3} & a_{3,4}\cr a_{4,1} & a_{4,2} & a_{4,3} & a_{4,4}\end{bmatrix}\begin{pmatrix}x_1\cr x_2\cr x_3\cr x_4\end{pmatrix} = \begin{pmatrix}y_1 \cr y_2 \cr y_3 \cr y_4\end{pmatrix} \end{align}\]Matrices are indexed by row (\(m\)) and by column (\(n\)).
\(m=2\), \(n=2\) matrix:
\[\begin{align} \begin{bmatrix}1&1\cr1&4\end{bmatrix} \end{align}\]\(m=3\), \(n=2\) matrix:
\[\begin{align} \begin{bmatrix}1&1&2\cr1&4&7\end{bmatrix} \end{align}\]Note
When \(m=n\) we have a square matrix.We denote matrices by capital letters: \(A\), \(B\), …
Matrix addition is element-wise:
\[\begin{align} (A+B)_{ij} = A_{ij} + B_{ij} \end{align}\]Example:
\[\begin{align} \begin{bmatrix}1&1\cr1&4\end{bmatrix} + \begin{bmatrix}1&0\cr2&6\end{bmatrix} = \begin{bmatrix}2&1\cr3&10\end{bmatrix} \end{align}\]Matrix multiplication is row by column.
\[\begin{align} (AB){ij} = \sum{k} A_{ik} B_{kj} \end{align}\]Example:
\[\begin{align} \begin{bmatrix}1 & 2\cr 3 & 4\end{bmatrix} \begin{bmatrix}5 & 6\cr 7 & 8\end{bmatrix} = \begin{bmatrix} 1\cdot 5 + 2\cdot 7 & 1\cdot 6 + 2\cdot 8 \cr 3\cdot 5 + 4\cdot 7 & 3\cdot 6 + 4\cdot 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \cr 43 & 50 \end{bmatrix} \end{align}\]The identity matrix \(I\) acts like the number \(1\) in multiplication.
For any compatible matrix \(A\):
\[\begin{align} AI = IA = A \end{align}\]Example:
\[\begin{align} I = \begin{bmatrix} 1 & 0 & 0 \cr 0 & 1 & 0 \cr 0 & 0 & 1 \end{bmatrix} \end{align}\]The determinant of a square matrix \(A\) is a scalar value that gives information about:
We write this as \(\det(A)\) or \(|A|\).
For
\[\begin{align} A = \begin{bmatrix} a & b \cr c & d \end{bmatrix} \end{align}\]the determinant is:
\[\begin{align} \det(A) = ad - bc \end{align}\]Example:
The inverse of a square matrix \(A\) is denoted \(A^{-1}\) and satisfies:
\[\begin{align} AA^{-1} = A^{-1}A = I \end{align}\]For a \(2 \times 2\) matrix A:
\[\begin{align} A = \begin{bmatrix} a & b \cr c & d \end{bmatrix} \end{align}\]if \(\det(A) \neq 0\), then the inverse is:
\[\begin{align} A^{-1} = \frac{1}{\det(A)} \begin{bmatrix} d & -b \cr -c & a \end{bmatrix}, \quad \text{where } \det(A) = ad - bc \end{align}\]If \(\det(A) = 0\), the matrix has no inverse.
Recall that a system of linear equations can be written compactly as:
\[\begin{align} Ax = y \end{align}\]where: - \(A\) is the coefficient matrix - \(x\) is the vector of unknowns - \(y\) is the vector of constants
If \(A\) is invertible (i.e. \(\det(A) \neq 0\)), we can solve for \(x\):
\[\begin{align} Ax &= y \\ A^{-1}Ax &= A^{-1}y \\ Ix &= A^{-1}y \\ x &= A^{-1}y \end{align}\]Thus, the solution exists and is unique whenever \(A\) has an inverse.
Taken from: Chiou, Jou, & Yang, (2015). Factors affecting public transportation usage rate: Geographically weighted regression. Transportation Research Part A: Policy and Practice.
Link to the paper…
Equation 1:
\[\begin{align} y_i = \beta_0(u_i, v_i) + \sum_{k=1}^p \beta_{ik}(u_i, v_i)x_{ik} + \epsilon_i \end{align}\]Equation 2:
\[\begin{align} \hat{\beta}(i) = [X^TW(i)X]^{-1}X^TW(i)Y \end{align}\]where:
The outcome \(y_i\) is explained by an intercept and a weighted combination of predictors, with coefficients that may change depending on the location \((u_i,v_i)\), plus some error.
where:
The estimated coefficients \(\hat{\beta}(i)\) are obtained by solving a weighted least squares problem: take the predictors \(X\), weight them with \(W(i)\), and solve for the coefficients that best fit \(Y\).
We’ve covered:
If in doubt:
Use the maths cheat sheet!
The practical will focus on understanding mathematical equations.
© CASA | ucl.ac.uk/bartlett/casa