Hi joels,
I appreciate your suggestions and timeliness!
While messy is not my outcome variable, it is a contrived example representing a few of the independent variable columns that I'm trying to use in my glm function. I'm excited that you knew to ask that! You're awesome!
Here's maybe a better example of the data frame that I'm actually using:
outcome_var <- as.factor(c("","barely Yes","","","big yes","","","","Big Yes",
"","big yes and crazy","","bigg yes","","","","","Yes",
"N","Y","N","N","No","Y","N","N","Y","Si","N","Nope"))
indep_var1 <- as.factor(c("","N","Y","Y/n","","","","","N","Y",
"","N","Y","Y/n","","","","","N","Y",
"","N","Y","Y/n","","","","","N","Y"))
indep_var2 <- as.factor(c("kinda","","Y","N","non","none","Yup 1","yup 2","","",
"kinda","","Y","N","non","none","Yup 1","yup 2","","",
"kinda","","Y","N","non","none","Yup 1","yup 2","",""))
indep_var3 <- as.factor(sample(20:60, 30, replace = TRUE))
indep_var4 <- as.factor(sample(0:1, 30, replace = TRUE))
messy_2.0 <- data.frame(outcome_var, indep_var1, indep_var2, indep_var3, indep_var4)
(*Note, this example is also contrived. If any of the entries are unclear, please let me know.)
(**Note, I want the independent variables' entries to also be converted into binary numeric values.)
In your solution, I see that you essentially created a new object called "clean" (genius, by the way!). For messy_2.0, would you recommend that I try your same solution for each column, then build a new data frame with the resulting cleaned up columns, and then use the new cleaned up columns in my analysis?
Or, given the slightly hairier nature of the data set, would you change your recommendations at all?
Thanks so much for your help!