My goal is to use logistic regression on a data set to determine which factors are significant in predicting a binary factor. I want to tidy the data by assigning binary values to the levels of the columns (lumping together levels that are equivalent in meaning), and change the data types of the columns so as to facilitate logistic regression. I'm trying to convert categorical data, currently stored as factors, into numeric data.
So far, nothing that I've found online has worked. If you could help me solve this problem I would really appreciate it.
Here is an example of a fictional column in my data set. Keep in mind that each column has hundreds of entries that match each of the categorical entries in the "messy" vector below.
messy <- c("N","Y","","Big Y", "(Other)","NA's") problem <- as.factor(messy)
Thank you, in advance!