cd ..

2023-07-13

How Measurement Error Affects OLS

this is another data science question I saw on twitter:

https://twitter.com/ryxcommar/status/1551262675342045184 (account privated as of writing)

with some corrections:

basically we're trying to answer how α\alpha, β\beta, etc changes if errors are added to the yy values.

note that ryxcommar's notation for normal distribution is N(μ,σ)N(\mu,\sigma) instead of N(μ,σ2)N(\mu,\sigma^2). we'll be using this for the answer.

so y=α+βx+ϵ+uy^* = \alpha + \beta x + \epsilon + u where uu is measurement error distributed N(2,2)N(2,2)

since xx isn't affected by the measurement error, β\beta is also unaffected.

which means uu only affects α\alpha & ϵ\epsilon.

OLS residuals always have mean 00, because the constant is carried over to the α\alpha term. so α=α+2\alpha^* = \alpha + 2. and ϵN(0,1)+N(0,2)N(0,5)+cov(e,u)\epsilon^* \sim N(0,1) + N(0,2) \sim N(0,\sqrt{5}) + cov(e,u). this is why the second correction is nice to have, so ϵN(0,5)\epsilon \sim N(0,\sqrt{5})

in conclusion:

Last modified: January 08, 2024. Website built with Franklin.jl and the Julia programming language.