Adam Dennett - a.dennett@ucl.ac.uk
1st August 2024
CASA0007 Quantitative Methods
Motivation:
flowchart TD
A["<img src='L6_images/Data.png'>"] --> B[<img src='L6_images/Book.png'>]
B[<img src='L6_images/Book.png'>] --> A["<img src='L6_images/Data.png'>"]
flowchart TD
I{Choosing a <br/>statistical test} -->
A[How Many Variables?] -->
C(2?) & G(More than 2?)
C -->|Categorical?| D(Chi Squared <br/>or Similar e.g. T-test)
C -->|Scale or Ratio?| E(Pearson Correlation <br/>or Spearman's Rank)
C -->|Both?| F(Ask Google)
G --> H(REGRESSION - everything else, <br/>get in the bin)
lm() (in R) uses a method called Ordinary Least Squares (OLS) to find the line of best-fit.\[Y = \beta_0 + \beta_1X_1 + \epsilon\] \[\hat{Y} = 62.35 + (-0.63 \times X) + \epsilon\] \[40.3 = 62.35 + (-0.63 \times 35) + 0\]

Call:
lm(formula = ATT8SCR ~ PTFSM6CLA1A, data = btn_sub)
Residuals:
Min 1Q Median 3Q Max
-8.300 -2.165 1.400 2.567 4.738
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.3457 3.7367 16.685 1.68e-07 ***
PTFSM6CLA1A -0.6292 0.1492 -4.217 0.00293 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.087 on 8 degrees of freedom
Multiple R-squared: 0.6897, Adjusted R-squared: 0.651
F-statistic: 17.78 on 1 and 8 DF, p-value: 0.002927
lm() is the function that fits a linear modelATT8SCR (Attainment 8 Score) is the dependent variable \(Y\)~ means “is modelled by”PTFSM6CLA1A (% Disadvantaged Students) is the independent variable \(X\)data = bnt_sub is the dataset we are using which contains the variables

Variation in % disadvantaged students in schools in Brighton appears to explain about 65-68% of the variation in Attainment 8 at the school level
The % disadvantaged students is a statistically significant predictor and the relationship appears to be linear
A 1% reduction in the number of disadvantaged students in a school appears to be associated with a 0.62 point increase in Attainment 8, and vice versa. So the council was right? Well, not quite…
Call:
lm(formula = ATT8SCR ~ PTFSM6CLA1A, data = btn_edit)
Residuals:
Min 1Q Median 3Q Max
-4.4563 -2.4489 -0.5964 2.4240 6.0536
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.5204 3.2593 16.728 1.65e-07 ***
PTFSM6CLA1A -0.2025 0.1252 -1.617 0.145
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.476 on 8 degrees of freedom
Multiple R-squared: 0.2462, Adjusted R-squared: 0.152
F-statistic: 2.614 on 1 and 8 DF, p-value: 0.1446
Call:
lm(formula = ATT8SCR ~ PTFSM6CLA1A, data = england_filtered)
Residuals:
Min 1Q Median 3Q Max
-36.268 -4.894 -1.345 3.662 33.140
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.955578 0.280566 203.00 <2e-16 ***
PTFSM6CLA1A -0.377548 0.009251 -40.81 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.59 on 3248 degrees of freedom
(101 observations deleted due to missingness)
Multiple R-squared: 0.339, Adjusted R-squared: 0.3388
F-statistic: 1665 on 1 and 3248 DF, p-value: < 2.2e-16
\[log(Y) = \beta_0 + \beta_1log(X_1) + \epsilon\]
Call:
lm(formula = log(ATT8SCR) ~ log(PTFSM6CLA1A), data = england_filtered)
Residuals:
Min 1Q Median 3Q Max
-1.20789 -0.08730 -0.00789 0.08197 0.56024
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.475612 0.012893 347.14 <2e-16 ***
log(PTFSM6CLA1A) -0.207443 0.004054 -51.18 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1438 on 3246 degrees of freedom
Multiple R-squared: 0.4465, Adjusted R-squared: 0.4464
F-statistic: 2619 on 1 and 3246 DF, p-value: < 2.2e-16
\[Y = \beta_0 + \beta_1log(X_1) + \epsilon\]
Call:
lm(formula = ATT8SCR ~ log(PTFSM6CLA1A), data = england_filtered)
Residuals:
Min 1Q Median 3Q Max
-43.631 -4.377 -0.919 3.504 34.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.0199 0.6080 129.97 <2e-16 ***
log(PTFSM6CLA1A) -10.3070 0.1911 -53.92 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.78 on 3246 degrees of freedom
Multiple R-squared: 0.4725, Adjusted R-squared: 0.4723
F-statistic: 2908 on 1 and 3246 DF, p-value: < 2.2e-16
© CASA | ucl.ac.uk/bartlett/casa