I'm trying to make a linear regression between two categorical/factor variables: the respondents gender (Gender
) and their willingness to share personal data in a particular scenario (Q34
).
For gender, respondents are categorised as: male (1), female (2), other (3), and undisclosed (4)
For willingness to share data, respondents are categorised as: unwilling (1), willing (2), don't know (97)
I'm only interested in respondents who responded as male, female, unwilling, or willing.
So far, I've tried to run a linear regression between these two categorical variables by converting them into dummy variables:
# create factor variable with levels for Q34
wave2$Q34 <- factor(NA,levels=c("1", "2"))
# fill in values based on existing dummy variables
wave2$Q34[wave2$Q34==1] <- "unwilling"
wave2$Q34[wave2$Q34==2] <- "willing"
# linear regression
gender_Q34_regression <- lm(Q34~Gender, data = wave2)
screenreg(gender_Q34_regression)
I'm getting the warning message:
> In `[<-.factor`(`*tmp*`, wave2$Q34_simple == 1, value = c(NA_integer_, :
> invalid factor level, NA generated
Is this because I'm assigning a new value to a factor variable, but the new value is not a valid level of the factor? I think this is something about numeric levels vs string values, but I have no idea how to fix
Thank you!