How to add fixed effects for years and industry into regression

Hey everyone,

I am trying to add SIC2 sectors and years into my regression, but am not sure if my current method is correct. I made dummy variables for both as follows:

dummyFyear <- reorder(Thesis2$fyear , Thesis2$gdwlia ,FUN=mean)
dummyindustry <- reorder(Thesis2$sic2,Thesis2$gdwlia, FUN =mean)

My question,
Can I add these dummy variables into my regression or do I need to treat them differently?
Thank you in advance!
For reproducing:

gvkey <- c(1, 1, 1, 1, 2,2,2, 4, 4 )
fyear <- c(2005,2006,2007,2008, 2007,2008,2009 , 2005,2006)
sic2 <- c("10-14", "10-14", "10-14", "10-14", "15-17","15-17","15-17", "90-98", "90-98" )
gdwlia <- c(-50 ,-65 ,1 ,-100 ,-200 ,-250 ,-32 ,-40, -8)
cashflow <- c(100, 110, 120, 130, 500, 550, 600, 50, 60)
lagAT <- c(1000,1500,1300,1200, 300,500, 800, 70, 40)

dummyFyear <- reorder(reprex$fyear , reprex$gdwlia ,FUN=mean)
dummyindustry <- reorder(reprex$sic2,reprex$gdwlia, FUN =mean)

reprex <- data.frame(gvkey, fyear, sic2, gdwlia, cashflow, lagAT)

model <- gdwlia ~ cashflow +lagAT + dummyFyear + dummyindustry
resultatenmodel <-lm(model, data=reprex)
summary(resultatenmodel)

Yours sincerely,

I think you need to look at a good discussion of one hot encoding for categorical variables in machine learning. The term "dummy variable" is really misleading and causes endless confusion for people learning modeling.

The lm() function in R stats will do the hard work of one hot encoding your categorical variables if the have been cast as factors.

lin_mod <- lm(mpg ~ wt + factor(gear), data = mtcars)
summary(lin_mod)

Call:
lm(formula = mpg ~ wt + factor(gear), data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.517 -2.358 -0.355  1.850  5.821 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    35.2156     2.8690  12.274 8.72e-13 ***
wt             -4.9090     0.7112  -6.902 1.68e-07 ***
factor(gear)4   2.1631     1.4485   1.493    0.147    
factor(gear)5  -0.9121     1.7519  -0.521    0.607    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.915 on 28 degrees of freedom
Multiple R-squared:  0.7887,	Adjusted R-squared:  0.766 
F-statistic: 34.83 on 3 and 28 DF,  p-value: 1.375e-09

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.