Turning factor variables into dummy variables ?

isla42 · April 5, 2023, 1:50pm

I'm trying to make a linear regression between two categorical/factor variables: the respondents gender (Gender) and their willingness to share personal data in a particular scenario (Q34).

For gender, respondents are categorised as: male (1), female (2), other (3), and undisclosed (4)
For willingness to share data, respondents are categorised as: unwilling (1), willing (2), don't know (97)
I'm only interested in respondents who responded as male, female, unwilling, or willing.

So far, I've tried to run a linear regression between these two categorical variables by converting them into dummy variables:

# create factor variable with levels for Q34
wave2$Q34 <- factor(NA,levels=c("1", "2"))

# fill in values based on existing dummy variables
wave2$Q34[wave2$Q34==1] <- "unwilling"
wave2$Q34[wave2$Q34==2] <- "willing"

# linear regression 
gender_Q34_regression <- lm(Q34~Gender, data = wave2)
screenreg(gender_Q34_regression)

I'm getting the warning message:

> In `[<-.factor`(`*tmp*`, wave2$Q34_simple == 1, value = c(NA_integer_,  :
>   invalid factor level, NA generated

Is this because I'm assigning a new value to a factor variable, but the new value is not a valid level of the factor? I think this is something about numeric levels vs string values, but I have no idea how to fix

Thank you!

MarekGierlinski · April 6, 2023, 11:02am

Your first command creates a column Q34 with all values equal to NA. The second command attempts to replace 1 with "unwilling". But there is no 1 in the column Q34, only NAs. In addition to that, you create a factor with levels 1 and 2 and then attempt to make it unwilling and willing.

You need to use the existing column in your data set, containing values of 1 or 2. Let's say this data column is called willingness. Then, you can do

wave2$Q34 = "-"
wave2$Q34[wave2$willingness == 1] <- "unwilling"
wave2$Q34[wave2$willingness == 2] <- "willing"
wave2$Q34 <- factor(wave2$Q34)

Alternatively, you can use tidyverse for a more elegant solution:

wave2 |>
mutate(
  Q34 = case_match(
    willingness,
    1 ~ "unwilling",
    2 ~ "willing") |>
  as_factor()
)

system · May 18, 2023, 11:02am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.