I am analysing FE, RE and Pooled Ols models for Panel data (cantons=26, T=6, N=156, Balanced set). All my variables are in percentage.
Y = employment rate of canton refugees
x1 = percentage share of jobs in small Businesses
x2 = percentage share of jobs in large Businesses
Controls = % share of foreigners, cantonal GDP as a percentage to the country GDP, unemployment rate of natives
I want to adjust my regression models for clustered SE by group (canton = state), because standard errors become understated when serial correlation is present, making hypothesis testing ambiguous.
Since there is only one observation per canton and year, clustering by year and canton is not possible. Results have been clustered by canton, because it is assumed that these clusters are independent from each other based on the autonomous nature of cantons, due to the federalist nature of the country. Year clusters are assumed to be dependent on each other, due to the nature of lag effects in economic theories/mechanisms.
- Do I need to prove, that serial correlation is present or is it okay to assume that serial correlation is present, because it is likely that observations of the same canton over time are correlated. If it is necessary to test for serial correlation before clustering SE, which code do I use?
Here is my code:
FE Model: fixedm6 <- plm(Y ~ X + X1 + controls, data=busdata, index=c("canton", "year"), model="within", effect = 'twoways')
FE Model mit clustered SE:
cfixedm6 <- coeftest(fixedm6, vcov=vcovHC(fixedm6, method = "arellano", type="HC3",cluster="group"))
Pooled OLS Model:
m6pool <- plm(Y ~ X + X1 + X2, data=busdata, index=c("canton", "year"), model="pooling")
Pooled OLS mit clustered SE:
cm6pool <- coeftest(m6pool, vcov=vcovHC(m6pool, type="HC3", cluster="group"))
F-test without Clustered SE:
p-value < 2.2e-16 ----> FE is better fit
when I insert models with clustered SE:
Error in UseMethod("pFtest") : no applicable method for 'pFtest' applied to an object of class "coeftest"
The same occurs with other lmtest functions (phtest for Hausmann test). RE Model:
randm6 <- plm(eY ~ X + X1 + X2, index=c("canton", "year"), data=busdata, model="random")
RE MOdel mit clustered SE:
crandm6 <- coeftest(randm6, vcov=vcovHC(randm6, method = "white1", type="HC3", cluster="group"))
Without clustered SE:
phtest(fixedm6, randmo6) = p value indicates FE is better fit
with clustered SE:
phtest(cfixedm6, crandmo6) = Error in UseMethod("phtest") : no applicable method for 'phtest' applied to an object of class "coeftest"
- Do I have to compare the models first, without clustered SE and then based on Ftest/hausmann tests etc., find the best model and then cluster the SE for the model?
Without clustering SE in the models, I can easily use lmtests to compare for significances of models. However, this seems false, considering that p-values will be distorted. However, as soon as I include clustered SE, I dont know how to code to compare and determine which model is the best fit.
How do I need to approach this in R? Most online resources discuss FE/RE etc, and then discuss clustered SEs, but never how to compare models that have cluster SEs. What am I doing wrong? Which R code would be best here? Am I using the wrong packages?
best regards, Jill