Huanfa Chen - huanfa.chen@ucl.ac.uk
12 September 2025
Quantitative research is the process of collecting and analysing numerical data to describe, model, and predict variables of interest.
Garbage in, garbage out.
By the end of this lecture you should:
Nominal | Ordinal | Interval | Ratio | |
---|---|---|---|---|
Categorizes and labels variables | ✔ | ✔ | ✔ | ✔ |
Ranks categories in order | ✔ | ✔ | ✔ | |
Has known, equal intervals | ✔ | ✔ | ||
Has a true or meaningful zero | ✔ |
Type | Category | Notes |
---|---|---|
Quantitative (numerical) data | Discrete data | Only in whole numbers, e.g. number of staffs |
Continuous data | e.g. temperature, 23°C, 23.4°C | |
Qualitative (categorical) data | Nominal | Same as nominal in ‘Levels of measurement’ |
Ordinal | See above |
Denote city population by \([y_1, y_2, ..., y_n]\) and variance by \(\sigma^2\)
\[ \begin{aligned} \sigma^2 &= \frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{n} \\ &= \frac{(y_1 - \bar{y})^2 + (y_2 - \bar{y})^2 + \dots + (y_n - \bar{y})^2}{n} \end{aligned} \]
A large variance means considerable spreadedness in data.
\[ \begin{aligned} \text{Standard Deviation} = \sqrt{\text{Variance}} \end{aligned} \]
Type | Source | Handling |
---|---|---|
Error Outliers | From mistakes in data collection/entry/measurement, e.g. a temperature sensor reading 500 °C | Should be corrected or removed |
Irregular Pattern Outliers | ||
Influential Outliers |
Type | Source | Handling |
---|---|---|
Error Outliers | ||
Irregular Pattern Outliers | Genuinely occur, but do not follow general pattern or relationship in the dataset, e.g. sudden spikes in sales in Black Friday | They might indicate unusual events or anomalies worth investigating. If the purpose is to study overall pattern, they should be removed |
Influential Outliers |
Type | Source | Handling |
---|---|---|
Error Outliers | ||
Irregular Pattern Outliers | ||
Influential Outliers | Appear extreme but are integral to the underlying pattern or model, e.g. NYC in US city population data | Should keep them, as removing them could distort the analysis or overlook important features of the data |
We’ve covered:
Practical will focus on setting up Python environment and describing a dataset.
© CASA | ucl.ac.uk/bartlett/casa