Beatrice Taylor - beatrice.taylor@ucl.ac.uk
8th October 2025
Overview of lecture 2
Continued concepts from exploratory data analysis, in particular probability distributions.
How do we understand how likely events are to occur?
Question
What is the probability of someone at UCL being over 190cm?
Answer
Try to understand the distribution of heights.

Statistical tests: - formal way to define a threshold of what is an interesting result - and hence evaluate the hypothesis
By the end of this lecture you should be able to:
Research question
A research question focuses on a specific problem.
Hypothesis
A formal statement that you will seek to prove or disprove.
What do you think is the hypothesis here?
Is it a fair coin?
What’s the probability that it’s fair?
If the coin is fair, how likely would it be to see 7 heads out of 10 flips?
Correct formulation:
If the coin is fair, how likely would it be to see 7 heads out of 10 flips or an even more extreme result?
In five simple steps.
Define the null and alternative hypothesis
\(H_0\) - the null hypothesis
\(H_1\) - the alternative hypothesis
Set your significance level \(\alpha\)
The significance level is the threshold below which you reject the null hypothesis.
Decide what “too unlikely” means before you do the test

Identify the evidence
Calculate the p-value
The p-value is the probability of seeing the evidence, or something even more extreme, if the null hypothesis is true.
Compare p-value with significance level
In order to evaluate our hypothesis we just have to do the five steps:
… where things can go wrong.
The true null hypothesis is incorrectly rejected.
The null hypothesis is true, but you get a false positive leading to you rejecting the null hypothesis.
This is also called a false positive.
Example: In court a defendant is found guilty despite being innocent.
The false null hypothesis is incorrectly accepted.
The null hypothesis is false, but you get a false negative result, leading you to accepting the null hypothesis.
This is also called a false negative.
Example: In court a defendant is found innocent despite being guilty.
NHS offers breast cancer screening for all people with breasts between the ages of 50 and 70.
The hypothesis:
\(H_0:\) The individual doesn’t have breast cancer.
\(H_1:\) The individual does have breast cancer.
From NHS digital.
Type I error: false positive
From NHS digital.
Type II error: false negative

The hypothesis should not come out of thin air.
Should consider:
It’s important to not make unethical assumptions in choosing the hypothesis.
Example
Police profiling - assumes a correlation between appearance and crime
Correlation: Two variables are statistically related, as one changes so does the other.
Causation: One variable influences the other variable to occur.
Causation implies correlation.
BUT correlation does not imply causation!
Image credit: [Spurious Correlations](https://www.tylervigen.com/spurious/correlation/19598_google-searches-for-report-ufo-sighting_correlates-with_the-number-of-librarians-in-hawaii)
You might not know whether events are correlated, or causing each other
BUT
you should use your contextual understanding to come up with plausible (and ethical) initial questions.

It’s a process
Research question
Are male and female students similar heights?
Research hypothesis
Male and female students are different heights on average.
Define the null and alternative hypothesis
\(H_0\): The mean height of male and female students is the same.
\(H_1\): The mean height of male and female students is different.
Set your significance level
\(\alpha = 0.05\)
Identify the evidence
I’ve collected data from 198 students, as follows:
| Group | Sample Size | Mean (cm) | std (cm) |
|---|---|---|---|
| Female students | 95 | 170 | 5 |
| Male students | 103 | 180 | 6 |
Calculate the p-value
Aha!
How do we do this? We need to know what statistical test to use!
Numerically testing whether the data supports the hypothesis.
Parametric tests
Non-parametric tests
Student’s T-test is used to compare the mean of a dataset.
This is William Sealy Gosset - he was **not** a student.
Image credit: https://en.wikipedia.org/wiki/William_Sealy_Gosset#/media/File:William_Sealy_Gosset.jpg
Calculate:
Which we use to identify the p value - typically using a ‘look up table’.
Image credit: https://www.geeksforgeeks.org/data-science/t-test/
Tests whether the population mean is equal to a specific value or not
The test statistic is calculated as:
\[\begin{align} t = \frac{\bar{x} - \mu_{0}}{s / \sqrt{n}} \end{align}\]where
The number of degrees of freedom is the number of values in the final calculation that are free to vary.
\[\begin{align} df = n-1 \end{align}\]Tests if the population means for two different groups are equal or not.
The test statistic is:
\[\begin{align} t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \end{align}\]with \(s_1, s_2\) the sample standard deviations.
The number of degrees of freedom is the number of values in the final calculation that are free to vary.
In the two-sample Student’s T-test the degrees of freedom are:
\[\begin{align} df = n_1 + n_2 - 2 \end{align}\]Tests if the difference between paired measurements for a population is zero or not - normally used with longitudinal data.
The test statistic is:
\[\begin{align} t = \frac{\bar{d}}{s_d / \sqrt{n}} \end{align}\]where
Tests can be one-tailed or two-tailed - which you want is determined when you define the hypothesis.

One tailed: if you only care is the mean is significant in one direction
Two tailed: if you care about the mean being different regardless of direction

Tests if a sample dataset came from a known distribution.
The Kolmogorov–Smirnov test statistic is:
\[\begin{align} D_n = \sup_x \, | F_n(x) - F(x) | \end{align}\]where
Note
‘\(sup\)’ is the suprenum - think of it as the smallest upper bound.The empirical distribution function (EDF) is:
\[\begin{align} F_{n}(x) = \frac{1}{n} \sum_{i=1}^{n} 1_{(-\infty ,x]}(X_{i}) \end{align}\]where
Tests if the underlying distributions of two sample datasets are the same.
For the two-sample test:
\[\begin{align} D_{n,m} = \sup_x \, | F_n(x) - G_m(x) | \end{align}\]where
The hypotheses would be:
Larger values of the test statistic \(D\) is stronger evidence against \(H_0\).
It’s easy to fit a KDE to data in Python:

Define the null and alternative hypothesis
\(H_0\): The mean height of male and female students is the same.
\(H_1\): The mean height of male and female students is different.
Set your significance level
\(\alpha = 0.05\)
Identify the evidence
Group 1 – female students
\(\bar{x}_1 = 170\), \(s_1 = 5\), \(n_1\) = 95
Group 2 – male students
\(\bar{x}_2 = 180\), \(s_2 = 6\), \(n_2\) = 103
Calculate the p-value
Substituting values:
\[\begin{align} s_p &= \sqrt{\frac{(95-1)\cdot 5^2 + (103-1)\cdot 6^2}{95+103-2}} \approx 5.55 \end{align}\]Now compute \(t-value\):
\[\begin{align} t &= \frac{170 - 180}{5.55 \cdot \sqrt{\tfrac{1}{95} + \tfrac{1}{103}}} \approx -12.7 \end{align}\]For Student’s T-Test we need degrees of freedom:
\[\begin{align} df = n_1 + n_2 - 2 = 95 + 103 - 2 = 196 \end{align}\]Traditionally this uses look up tables - we’re going to use Python.
We can do all the previous calulations in one step using the `scipy’ library - we’ll practise this in the tutorial.
Compare p-value with hypothesis level
Now we need to compare the p-value to our siginificance level.
p-value = \(2.0151302848603336e-10 < 0.05\) = alpha
…we reject \(H_0\)!
And conclude that male and female students have significantly different heights.
We’ve covered:
The practical will focus on establishing and evaluating a research hypothesis in python.
Make sure you have questions prepared!
© CASA | ucl.ac.uk/bartlett/casa