Collapse/Assign the same response options to multiple categorical variables

miakirk · March 25, 2023, 1:07am

I have lots of categorical variables that have the same response options (likert scale). I want to collapse the number of response options for all of them. How can I do this in bulk instead of one at a time?

I found a thread from 2020 on this page that included the code below (I inserted my data though) but I am not able to get it to work. Any guidance would be appreciated.

A,B,C,D,E are my variable names
Response options are on a 5 or 7 point likert scale and I would like to collapse to a 3 point scale.

data1mod <- data1(
A = c("Strongly support","Support","Don't know/No opinion","Oppose","Strongly oppose"),
B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))

data1mod %>% mutate_all(list(rec = ~ recode(.,
"Strongly support","Support","Somewhat support"="Support",
"Don't know/No opinion"="Don't know/No opinion"
"Somewhat oppose","Oppose","Strongly oppose"="Oppose",
.default = NA_character_)))

This is the error I get: Error in data1(A = c("Strongly support", "Support", "Don't know/No opinion", :
could not find function "data1"

FJCC · March 25, 2023, 3:06am

Are you looking for something like this? Only Somewhat Support and Strongly Oppose get renamed but that is easily adjusted.

data1mod <- data.frame(
  A = c("Strongly support","Support","Don't know/No opinion","Oppose","Strongly oppose","Strongly support","Strongly support"),
  B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))
library(dplyr)

data1mod %>% mutate(across(.cols = everything(), .fns = ~ recode(.,
                                            "Somewhat support"="Support",
                                            "Strongly oppose"="Oppose",
                                            )))
#>                       A                     B                     C
#> 1      Strongly support      Strongly support      Strongly support
#> 2               Support               Support               Support
#> 3 Don't know/No opinion               Support               Support
#> 4                Oppose Don't know/No opinion Don't know/No opinion
#> 5                Oppose       Somewhat oppose       Somewhat oppose
#> 6      Strongly support                Oppose                Oppose
#> 7      Strongly support                Oppose                Oppose
#>                       D                     E
#> 1      Strongly support      Strongly support
#> 2               Support               Support
#> 3               Support               Support
#> 4 Don't know/No opinion Don't know/No opinion
#> 5       Somewhat oppose       Somewhat oppose
#> 6                Oppose                Oppose
#> 7                Oppose                Oppose

^{Created on 2023-03-24 with reprex v2.0.2}

miakirk · March 25, 2023, 5:22pm

Thanks for you reply.
I would like three response categories total for each variable.
Support, DK, Oppose

FJCC · March 25, 2023, 6:44pm

To modify a level, add it to the items in the recode() function, with the old name on the left and the new name on the right. For example, to add the change of all the "Strongly support" items to "Support", change my original code

data1mode <- data1mod %>% mutate(across(.cols = everything(), .fns = ~ recode(.,
                                            "Somewhat support"="Support",
                                            "Strongly oppose"="Oppose",
                                            )))

to this

data1mode <- data1mod %>% mutate(across(.cols = everything(), .fns = ~ recode(.,
                                            "Somewhat support"="Support",
                                            "Strongly oppose"="Oppose",
                                            "Strongly support" = "Support"
                                            )))

miakirk · March 25, 2023, 7:16pm

Okay I will give this a shot now.

miakirk · March 25, 2023, 7:44pm

Hmm I am not sure that it worked. I do not get any errors but when I double check the variables the console is still showing the original 7 response options. (see code below)

Another question... this method created a new dataset called data1mod (with 4 variables and 7 obs). My original dataset is called data1 (with 49 variables and 567 obs).

How do I apply what I am trying to do with the B,C,D,E variables to my original dataset (data1) so that I still have the 567 obs for variables B,C,D,E but w/ the new reduced response options?

Thank you for your help.

data1mod <- data.frame(
B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))

data1mod %>% mutate(across(.cols = everything(), .fns = ~ recode(.,
"Somewhat support" = "Support",
"Support" = "Support",
"Strongly support" = "Support",
"Don't know/No opinion" = "Don't know/No opinion",
"Somewhat oppose" = "Oppose",
"Oppose" = "Oppose",
"Strongly oppose" = "Oppose")))

table(data1mod$B)

Don't know/No opinion Oppose Somewhat oppose Somewhat support Strongly oppose Strongly support
1 1 1 1 1 1
Support
1

FJCC · March 25, 2023, 8:33pm

Have you stored the result of the mutate() process in a variable? In the example below, I name the original data data1 and I use your code to reduce the original levels into three levels. Your code works as I would expect and achieves what I think you want.

data1 <- data.frame(
  A = c("Strongly support","Support","Don't know/No opinion","Oppose","Strongly oppose","Strongly support","Strongly support"),
  B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data1 <- data1 %>% 
  mutate(across(.cols = everything(), .fns = ~ recode(.,
                                                      "Somewhat support" = "Support",
                                                      "Support" = "Support",
                                                      "Strongly support" = "Support",
                                                      "Don't know/No opinion" = "Don't know/No opinion",
                                                      "Somewhat oppose" = "Oppose",
                                                      "Oppose" = "Oppose",
                                                      "Strongly oppose" = "Oppose")))

data1
#>                       A                     B                     C
#> 1               Support               Support               Support
#> 2               Support               Support               Support
#> 3 Don't know/No opinion               Support               Support
#> 4                Oppose Don't know/No opinion Don't know/No opinion
#> 5                Oppose                Oppose                Oppose
#> 6               Support                Oppose                Oppose
#> 7               Support                Oppose                Oppose
#>                       D                     E
#> 1               Support               Support
#> 2               Support               Support
#> 3               Support               Support
#> 4 Don't know/No opinion Don't know/No opinion
#> 5                Oppose                Oppose
#> 6                Oppose                Oppose
#> 7                Oppose                Oppose
table(data1$B)
#> 
#> Don't know/No opinion                Oppose               Support 
#>                     1                     3                     3

^{Created on 2023-03-25 with reprex v2.0.2}

miakirk · March 25, 2023, 8:39pm

I'm not sure what you mean by store the results of the mutate process in a variable.

I thought this method would be faster/more efficient but since I am unfamiliar with the mutate/recode process it's not actually faster for me. So I think I will change tactics and use the factor, levels, label method instead. Do you know of a way to apply the same factor/levels/labels to multiple variables/ columns? Right now I am doing it one by one which works but I know there has to be a more efficient way.

data1$A <- factor(data1$A,
levels = c("Strongly support","Support","Don't know/No opinion","Oppose","Strongly oppose"),
labels = c("Support","Support","Don't know/No opinion","Oppose","Oppose"))

data1$B <- factor(data1$B,
levels = c("Strongly support","Support","Somewhat support","Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
labels = c("Support","Support","Support","Don't know/No opinion","Oppose","Oppose","Oppose"))

data1$C <- factor(data1$C,
levels = c("Strongly support","Support","Somewhat support","Don't know/ No opinion","Somewhat oppose","Oppose","Strongly oppose"),
labels = c("Support","Support","Support","Don't know/ No opinion","Oppose","Oppose","Oppose"))

This is me trying to apply it to columns 3 through 6.. but this converted all responses to NA for those columns. I feel like I am close but something is off.

data1[ ,3:6] <- factor(data1[ ,3:6],
levels = c("Strongly support","Support","Somewhat support","Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
labels = c("Support","Support","Support","Don't know/No opinion","Oppose","Oppose","Oppose"))

FJCC · March 25, 2023, 9:23pm

If I run

data1 %>% 
  mutate(across(.cols = everything(), .fns = ~ recode(.,
                                                      "Somewhat support" = "Support",
                                                      "Support" = "Support",
                                                      "Strongly support" = "Support",
                                                      "Don't know/No opinion" = "Don't know/No opinion",
                                                      "Somewhat oppose" = "Oppose",
                                                      "Oppose" = "Oppose",
                                                      "Strongly oppose" = "Oppose")))

nothing in data1 changes. The result of the code is printed in the console but data1 is unchanged. If I run

data1 <- data1 %>% 
  mutate(across(.cols = everything(), .fns = ~ recode(.,
                                                      "Somewhat support" = "Support",
                                                      "Support" = "Support",
                                                      "Strongly support" = "Support",
                                                      "Don't know/No opinion" = "Don't know/No opinion",
                                                      "Somewhat oppose" = "Oppose",
                                                      "Oppose" = "Oppose",
                                                      "Strongly oppose" = "Oppose")))

then the result of the code is stored in a new version of data1. Notice the data1 <- at the beginning of the code. That is what stores the result. I thought you said that the code would run but nothing changed in the data frame. That made me suspect you were not storing the result in an existing or a new variable.
The code I showed with mutate(across()) is a method to apply one action to several columns. What you are doing with factor() seems dangerous. I expect your factor will have more than three levels but some of the labels will be duplicates.
What isn't working when you use the code I showed in my last post?

data1 <- data1 %>% 
  mutate(across(.cols = everything(), .fns = ~ recode(.,
                                                      "Somewhat support" = "Support",
                                                      "Support" = "Support",
                                                      "Strongly support" = "Support",
                                                      "Don't know/No opinion" = "Don't know/No opinion",
                                                      "Somewhat oppose" = "Oppose",
                                                      "Oppose" = "Oppose",
                                                      "Strongly oppose" = "Oppose")))

EconProf · March 25, 2023, 9:58pm

If you only want to recode some of the columns you need to specify them. In this case, if you want to recode column B to column D but not E, a simple way is B:D.

Because the replacements and original values are all characters, you do not have to include "Support" to "Support". Same with "Oppose". They will simply default to the original values. Do you still want DK?

library(tidyverse)

data1mod <- data.frame(
  B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))

data1mod <- data1mod %>% 
  mutate(across(.cols = B:D, .fns = ~ recode(., 
                                             "Somewhat support" = "Support",
                                             "Strongly support" = "Support",
                                             "Don't know/No opinion" = "DK",
                                             "Somewhat oppose" = "Oppose",
                                             "Strongly oppose" = "Oppose")))
data1mod
#>         B       C       D                     E
#> 1 Support Support Support      Strongly support
#> 2 Support Support Support               Support
#> 3 Support Support Support      Somewhat support
#> 4      DK      DK      DK Don't know/No opinion
#> 5  Oppose  Oppose  Oppose       Somewhat oppose
#> 6  Oppose  Oppose  Oppose                Oppose
#> 7  Oppose  Oppose  Oppose       Strongly oppose

^{Created on 2023-03-25 with reprex v2.0.2}

miakirk · March 25, 2023, 10:18pm

Hmm I think adding the columns helped and makes sense, but this is still not working as I thought it would.

Is there a way to modify columns in the original dataframe without creating a new dataframe ? I want to retain all observations of my original dataframe (data1, 500+ obs) but just collapse response options, not create a bunch of new dataframes. It seems inefficient to have lots of dataframes to call in the future when making a TableOne and other analyses. Am I missing something?

FJCC · March 25, 2023, 10:50pm

The code in my last post before this one shows how to do that.

technocrat · March 26, 2023, 9:40am

Using @FJCC 's data frame, I would do it along these lines

d <- data.frame(
  A = c("Strongly support","Support","Don't know/No opinion","Oppose","Strongly oppose","Strongly support","Strongly support"),
  B = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  C = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  D = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"),
  E = c("Strongly support","Support","Somewhat support", "Don't know/No opinion","Somewhat oppose","Oppose","Strongly oppose"))

d$A <- sub("Strongly ","",d$A)
d$A <- sub("Somewhat ","",d$A)
stringr::str_to_title(d$A)
#> [1] "Support"               "Support"               "Don't Know/No Opinion"
#> [4] "Oppose"                "Oppose"                "Support"              
#> [7] "Support"

miakirk · March 26, 2023, 4:24pm

Thank you so much for your help. I figured it out and it is running correctly. Now on to make my TableOne.
This will save me so much time in the future. yay!

system · April 2, 2023, 4:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.