Hey everyone,
First of all, I want to say that I am very inexpierienced in both econometric methods as well as the coding. I am currently learning a lot though and would love to really understand what is going on here.
I think I have already gotten to the point that I have figured out what the problem is roughly, but I don't properly understand why that is.
I have a panel data set which contains information about death rates for specific occupations in specific districts in specific years i.e., there are basically three dimensions. Now I am supposed to run fixed effects regressions with district-by-group and district-by-year fixed effects.
Although I could never create something like this on my own, I think that I do understand what this does: the district by occupations fixed effects control for unobserved time-invariant heterogeneity between the occupation groups within districts. That is, it may be that certain occupational groups are just fundamentally different to others in relevant aspects independent of time. They might be more productive, have a higher wage level etc.
The district by year fixed effects control for unobserved time-variant heterogeneity between districts i.e., some shock might affect district A, but not district B.
What I had to estimate here worked after figuring out the code. I created a panel dataset with the appropriate indices and ran the fixed effects regression:
panel_data <- pdata.frame(df, index = c("id_year", "id_occ"))
model <- plm(deaths_tot_pc ~ factor(year) * bluecollar,
data = panel_data,
model = "within")
Now in a next step I am supposed to check for heterogeneous effects by evaluating the same effect, but on each group individually and this is where I now struggle.
If I run the same model on a subgroup like occupational group 1, then I get the error message "empty model". This is - according to my research - due to the fact that there is no within variation leading to R dropping the variables and thus an empty model.
How would I have to change the following code to get around this problem? I would love if someone could also explain the statistical reasoning behind this. I cannot figure it out on my own right now...
subset_data <- subset(panel_data, occ == 1)
reg <- plm(formula = deaths_tot_pc ~ bluecollar*factor(year),
data = subset_data,
model = "within")
I will try to create a dataframe tomorrow such that you can replicate the problem.
Thank you in advance!
All the best!