Analysis with categorical variable

Inuraghe · January 4, 2022, 3:51pm

My dataset consists of a numeric variable (called "N4") and several categorical variables that affect the numeric variable. For example there is a categorical variable called "die" that if it equals "alpha" then N4 takes values around 100, if it equals "beta" then N4 takes values around 300.

My goal is to figure out which of the categorical variables most affects my numeric variable.

Can it make sense to turn categorical variables into numerical variables and calculate correlation? Is there any other more effective analysis?

Elijah_Rona · January 4, 2022, 4:06pm

While some models can analyze the best variable even though some variables are factors, it is good practice to convert every categorical variable to dummy columns before training the model.

Dummy columns with 1s and 0s better predict the response variable than categorical columns.

You can use step_dummy() in recipes or it's equivalents in other packages to convert categorical variables into dummy columns.

system · January 25, 2022, 4:06pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.