Original variable predictions from a gls model

I would like to make prediction of gls model using R software.

I’m estimating a model (with autocorrelation and/or heteroscedasticity in the error term) by GLS and would like to make predictions of the dependent variable. Based on the theory of this model, the estimated model transforms the original variables.

Therefore, if the gls transforms the original variables, the predictions, made by the “predict” function from R, will be also the transformed dependent variable. Anyone could help me to clarify this understanding and how to get the original variable predictions?

I'm not sure I understand completely what you are asking. Are you trying to make sure the autocorrelation/heterscedasiticity is present in the predictions? Those primarily affect the variance of each prediction, not the estimated mean value.

Consider making a small example dataset with fitted model that demonstrates what you're talking about. (See some info on making a reproducible example here.) Or you could write out the equations that shows what the model does for predictions vs what you want.

Note you can read a bit more about what predict() does for GLS objects by looking at the help page for predict.gls. That could also be a place to get an example that demonstrates the issue.

1 Like

Thanks for the prompt reply @aosmith.

I would like to estimate and predict a variable using the gls model. By reading about the topic, I checked that the method transforms the variables from a linear model to a new set of observations that satisfy the constant variance assumption and use least squares to estimate the parameters. Two examples are WLS (correcting the heteroscedasticity) and quasi difference (correcting serial correlation) estimators. Please find the attached file to support my understanding.

I can estimate the gls easily but I would like to be sure if the predict.gls function predicts the original variable or the transformed one.

Thanks in advance for your attention.

While I feel pretty confident that you can use the predictions from nlme::gls() as the predictions of the response and not the linearly transformed predictions that was done for the GLS algorithm, I think finding that info will take some sleuthing.

The Pinheiro and Bates book, Mixed Effects Models in S and S-plus, is likely a good place to start looking fore more info. This is the book that package authors wrote about nlme, often considered the "go-to" for understanding the package.

Another idea is to look through the gls() function code. While reading code can be difficult (for me, at least), if you can wade through it you may well be able to see if results (betas, fitted values) are on the original or linearly transformed scale. See the function code by running, e.g., gls() in your R Console. Things look fairly complicated to me so this might not go to far.

My final idea is to create data where you know what the predictions should be on the original scale and then see if gls() recreates them. This could even be something simpler like fitting a model with OLS via lm() and then gls() with heteroskedastic errors argument and see if the predictions change.

2 Likes

Many thanks, @aosmith.

It really looks the predictions of the response and not the transformed variable. At least, by checking for a linear model with AR(1) error, the coefficients of the GLS estimated model and the predictions are quite similar from the linear model estimated by OLS (no serial correlation).

I also estimated a linear model with differenced variables and multiplied by (1 - AR1coefficient), following the rationale described in Wooldridge (2012). These results don't match the GLS estimations/predictions with AR(1) error.

1 Like