Model for a regression with only 2 survey years

Hello together

I have a regression which includes 2 survey years, namely 1999 and 2009. I am not sure whether I should use a panel regression (plm) or just a linear model (lm) with dummy variable for the respective year (1999 or 2009).
Can anyone give me a recommendation?

I have tried the following:

lm1 <- lm(giniA~ region+dummy_1999+age+status,data=fors)

#or should i use plm

plm1 <- plm(giniA~ region+age+status, data=fors, index = c("year"),model = "pooling")

Moreover, if I should make a panel regression, which model (pooling, fd, within,..) would be most suitable for a regression with two survey years?

Many thanks!


As long as you're only comparing two years, you can create a variable 1999 that is 1 for the year 1999 and 0 for 2009. If you're including more years, you need to create a variable for each year and set all to 0 but for the one that's actually the correct year.

I see you're using a variable region too, is this numeric? If not, it should follow the same rule.

Hope this helps,

Hi PJ,

Thank you very much for your reply.
In this case, how would you theoretically justify the choice of the linear model (with a dummy for the year) instead of choosing a panel regression?

thanks again!


I have not used panel regression before, but in the choice of model depends on your question and the type of data you have. Simple linear regression is always a benchmark when comparing to other machine learning techniques (including more complex forms of regression) because of its simplicity and rather transparent interpretation.

The question is not how you justify it per se, but compare it to other machine learning like random forest etc and see which performs best. In theory, if your regression model is as good as more complex methods, you prefer to use it because of its simplicity.

Hope this helps