Generalized least squares

In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model. The GLS is applied when the variances of the observations are unequal (heteroscedasticity), or when there is a certain degree of correlation between the observations. In these cases ordinary least squares can be statistically inefficient, or even give misleading inferences.

Method outline
In a typical linear regression model we observe data $$\{y_i,x_{ij}\}_{i=1..n,j=1..p}$$ on n statistical units. The response values are placed in a vector Y = (y1, ..., yn)&prime;, and the predictor values are placed in the design matrix X = xij, where xij is the value of the jth predictor variable for the ith unit. The model assumes that the conditional mean of Y given X is a linear function of X, whereas the conditional variance of the error term given X is a known matrix Ω. This is usually written as

Y = X\beta + \varepsilon, \qquad \mathrm{E}[\varepsilon|X]=0,\ \operatorname{Var}[\varepsilon|X]=\Omega. $$ Here β is a vector of unknown “regression coefficients” that must be estimated from the data.

Suppose b is a candidate estimate for β. Then the residual vector for b will be Y − Xb. Generalized least squares method estimates β by minimizing the squared Mahalanobis length of this residual vector:

\hat\beta = \underset{b}{\rm arg\,min}\,(Y-Xb)'\,\Omega^{-1}(Y-Xb), $$

Since the objective is a quadratic form in b, the estimator has an explicit formula:

\hat\beta = (X'\Omega^{-1}X)^{-1} X'\Omega^{-1}Y. $$

Properties
The GLS estimator is unbiased, consistent, efficient, and asymptotically normal:

\sqrt{n}(\hat\beta - \beta)\ \xrightarrow{d}\ \mathcal{N}\!\left(0,\,(X'\,\Omega^{-1}X)^{-1}\right). $$

GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor Ω = BB′, for instance using the Cholesky decomposition. Then if we multiply both sides of the equation Y = Xβ + ε by B−1, we get an equivalent linear model Y* = X*β + ε*, where Y* = B−1Y, X* = B−1X, and ε* = B−1ε. In this model Var[ε*] = B−1Ω(B−1)′ = I. Thus we can efficiently estimate β by applying OLS to the transformed data, which requires minimizing

(Y^*-X^*b)'(Y^*-X^*b) = (Y-Xb)'\,\Omega^{-1}(Y-Xb). $$

This has the effect of standardizing the scale of the errors and “de-correlating” them. Since OLS is applied to data with homoscedastic errors, the Gauss–Markov theorem applies, and therefore the GLS estimate is the best linear unbiased estimator for β.

Weighted least squares
A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of Ω are 0. This situation arises when the variances of the observed values are unequal (i.e. heteroscedasticity is present), but where no correlations exist among the observed variances. The weight for unit i is proportional to the reciprocal of the variance of the response for unit i.

Feasible generalized least squares
Feasible generalized least squares (FGLS) is similar to generalized least squares except that it uses an estimated variance-covariance matrix since the true matrix is not known directly.

The ordinary least squares (OLS) estimator is calculated as usual by



\widehat \beta_{OLS} = (X' X)^{-1} X' y $$

and estimates of the residuals $$\widehat{u}_j= (Y-Xb)_j$$ are constructed.

Construct $$ \widehat{\Omega}_{OLS} $$:



\widehat{\Omega}_{OLS} = \operatorname{diag}(\widehat{u}^2_1, \widehat{u}^2_2, \dots, \widehat{u}^2_n). $$

Estimate $$ \beta_{FGLS1}$$ using $$ \widehat{\Omega}_{OLS}$$ using weighted least squares



\widehat \beta_{FGLS1} = (X'\widehat{\Omega}^{-1}_{OLS} X)^{-1} X' \widehat{\Omega}^{-1}_{OLS} y $$



\widehat{u}_{FGLS1} = Y - X \widehat \beta_{FGLS1} $$



\widehat{\Omega}_{FGLS1} = \operatorname{diag}(\widehat{u}^2_{FGLS1,1}, \widehat{u}^2_{FGLS1,2}, \dots, \widehat{u}^2_{FGLS1,n}) $$



\widehat \beta_{FGLS2} = (X'\widehat{\Omega}^{-1}_{FGLS1} X)^{-1} X' \widehat{\Omega}^{-1}_{FGLS1} y $$

This estimation of $$\widehat{\Omega}$$ can be iterated to convergence given that the assumptions outlined in White hold.

The WLS and FGLS estimators have the following distributions



\widehat \beta_{WLS} ~\sim N(\beta, (X'\Omega^{-1}X)^{-1}) $$



\widehat \beta_{FGLS} ~\sim N(\beta, (X'\widehat{\Omega}_{OLS}^{-1}X)^{-1}(X'\widehat{\Omega}_{OLS}^{-1}\Omega\widehat{\Omega}_{OLS}^{-1}X)(X'\widehat{\Omega}_{OLS}^{-1}X)^{-1}) $$