Creating levels for a factor variable

Does R automatically create levels for factor variables?
I am trying to create linear models and two of my variables are factors - one being 'gender' (male or female). I didn't originally use the levels() function and left them as factor variables and everything ran fine, and for fitting the linear model, i.e:
fit <- lm(variableA~., data=dataset)
coef(fit)
it gave me an intercept for genderMale.
I then decided to use the levels function, i.e:
levels(dataset$gender)=c("0","1")
and cleared the variables, and ran my code again and it made no difference to my answers (my models, my predictions, the mean squared error ect.)
Does this mean I don't need the levels() function?
I'm not sure if i make much sense, and I'm new to R so would appreciate any help, thank you!

Hi, welcome to community.rstudio.com
"If you omit the levels, they’ll be taken from the data in alphabetical order". Means: it will create a unique set of values take from your data (as.character(x)) sorted into increasing order of x (see https://r4ds.had.co.nz/factors.html or the help file of factor ?factor) and use them as levels.

However factor levels should not be mistake for contrasts. The default used in model fitting (such as lm) in R is treatment coding (also known as dummy coding) for unordered factors (you can check the defaults with options('contrasts')). For other types of contrasts see https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/
If you want to use dummy coding for unordered factors, usually you are fine without caring about contrast settings in R.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.