Factors and dummy variables

Confused2023 · April 5, 2023, 8:20pm

Hello R users,

My general understanding is that, in R , nominal categorical variables (with 2 or more levels) must be first converted into factors and THEN into to dummy variables (k-1 dummy variables for k levels - dummy encoding). Is that correct?

Once we accomplish categorical variable -> factor -> dummy variables transformation, we can then use the dummy variable as an independent or dependent variable in a statistical model (P.S. : when using the function lm() in R, the function lm() automatically does the dummy variable conversion but I am not sure that being true for other models).

What if we converted the categorical variable straight into dummy variables without the intermediate factor() step? Would that still work in R if we passed the dummy variables to a statistical model? I think so...Which means that we could really skip the conversion to factors..

Thanks!

startz · April 5, 2023, 9:33pm

Sure.

In fact, there is a function dummy_cols() in the fastDummies package to help you do exactly that.

Confused2023 · April 5, 2023, 11:42pm

Thank you.
So no need to convert them to the categorical variables in the imported CSV dataset into factors. We can use the dummy_cols() directly on the column data.

system · May 17, 2023, 11:42pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.