Recode multiple categorical variables to new variables

I have a dataset with 11 variables describing ‘reasons for using e-cigarettes’ (ecig2crav, ecig2quit, ecig2symp, smokefree, exterior, bothering, rednoquit, red2quit, toxic5, cheaper5, cantstop), all are factor variables with 4 levels:

1=Not at all true,

2=Not very true,

3=Somewhat true,

4=Very true.

I want a function to create a new (labelled) factor variable which collapses these four categories into two: Not true (levels 1 and 2) and True (levels 3 and 4). How can I do this in a quick way in R? I want to create a new factor variable for each of the original variables. At the moment I have the code for the first two variables as

data %>%
mutate(ecig2crav_rec=recode(ecig2crav, "Not at all true"="Not true", "Not very true"="Not true", "Somewhat true"="True", "Very true"="True", .default = NA_character_),
ecig2quit_rec=recode(ecig2quit, "Not at all true"="Not true", "Not very true"="Not true", "Somewhat true"="True", "Very true"="True", .default = NA_character_))

I wonder if there is a way to avoid to write the recoding rules for each variable as they all share the same rules. Thanks.

Polly

You can use mutate_at() or the new across() to recode several columns at the same time.

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

Thank you Andresrcs for your reply. I tried mutate_at(), but it does not generate new variables. Instead, it overwrites the original variables. But I want new variables rather than overwriting the existing variables.

Polly

You can also generate new variables, simply pass your function as a named list. As I said, if you need more specific help, please provide a proper reprex.

Here is some code to generate some sample data.

mydata <- data.frame(ecig2crav = c("Not at all true", "Not very true", "Somewhat true", "Very true"), ecig2quit = c("Not very true", "Somewhat true", "Very true", NA), ecig2symp = c("Somewhat true", "Very true", "Not at all true", "Not very true"))

mydata

## ecig2crav ecig2quit ecig2symp
## 1 Not at all true Not very true Somewhat true
## 2 Not very true Somewhat true Very true
## 3 Somewhat true Very true Not at all true
## 4 Very true <NA> Not very true

Now I want to recode each of the three variables with the same rules ("Not at all true" = "Not true"; "Not very true" = "Not true"; "Somewhat true" = "True"; "Very true" ="True", NA=NA) to NEW variables. My previous approach was as below:

mydata %>%
mutate(ecig2crav_rec=recode(ecig2crav, "Not at all true"="Not true", "Not very true"="Not true", "Somewhat true"="True", "Very true"="True", .default = NA_character_),
ecig2quit_rec=recode(ecig2quit, "Not at all true"="Not true", "Not very true"="Not true", "Somewhat true"="True", "Very true"="True", .default = NA_character_), 
ecig2symp_rec=recode(ecig2symp, "Not at all true"="Not true", "Not very true"="Not true", "Somewhat true"="True", "Very true"="True", .default = NA_character_)). 

In this approach, I am repeating the rules for each variable which I would like to avoid. Thanks.

Polly

As I said, use a named list

library(dplyr)

mydata <-
    data.frame(
        ecig2crav = c("Not at all true", "Not very true", "Somewhat true", "Very true"),
        ecig2quit = c("Not very true", "Somewhat true", "Very true", NA),
        ecig2symp = c("Somewhat true", "Very true", "Not at all true", "Not very true")
    )

mydata %>% 
    mutate_all(list(rec = ~ recode(.,
                                 "Not at all true"="Not true",
                                 "Not very true"="Not true",
                                 "Somewhat true"="True",
                                 "Very true"="True",
                                 .default = NA_character_)
                    )
               )
#>         ecig2crav     ecig2quit       ecig2symp ecig2crav_rec ecig2quit_rec
#> 1 Not at all true Not very true   Somewhat true      Not true      Not true
#> 2   Not very true Somewhat true       Very true      Not true          True
#> 3   Somewhat true     Very true Not at all true          True          True
#> 4       Very true          <NA>   Not very true          True          <NA>
#>   ecig2symp_rec
#> 1          True
#> 2          True
#> 3      Not true
#> 4      Not true

Created on 2020-06-01 by the reprex package (v0.3.0)

Hi Andresrcs,

I copied your code to my Rstudio and got an error message as below. Am I doing anything wrong? Please bare with me, I am fairly new to R. Do I need to create a list before using the mutate_all? What if I only want to recode the first two variables? Thanks.

Error in recode(., Not at all true = "Not true", Not very true = "Not true", : unused arguments (Not at all true = "Not true", Not very true = "Not true", Somewhat true = "True", Very true = "True", .default = NA)

I figured out the error. Maybe my dplyr package was out of date. I reinstalled the package and it works now. Thank you very much. You have been very helpful!

Polly

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

Can I add one more question to this. After the recoding is done, how can I crosstabulate multiple tables at the same time to check my recoding has been done correctly. At the moment, I have to write one line of code for each pair of variables which I don't think is very efficient, particularly I have 11 pairs of variables that I would like to check. Is there a faster way to do this? Thanks.

with(data, table(ecig2crav, ecig2crav_rec))
with(data, table(ecig2quit, ecig2quit_rec))
with(data, table(ecig2symp, ecig2symp_rec))

Hi Polly,

This is a nice use case for the check_recode() function from the {finalfit} package.

Details on this package can be found here

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(finalfit)

mydata <-
  data.frame(
    ecig2crav = c("Not at all true", "Not very true", "Somewhat true", "Very true"),
    ecig2quit = c("Not very true", "Somewhat true", "Very true", NA),
    ecig2symp = c("Somewhat true", "Very true", "Not at all true", "Not very true")
  )

mydata %>% 
  mutate_all(list(rec = ~ recode(.,
                                 "Not at all true"="Not true",
                                 "Not very true"="Not true",
                                 "Somewhat true"="True",
                                 "Very true"="True",
                                 .default = NA_character_)
  )
  )->
mydata2

mydata2 %>% 
  check_recode()
#> $index
#> # A tibble: 3 x 2
#>   var1      var2         
#>   <chr>     <chr>        
#> 1 ecig2crav ecig2crav_rec
#> 2 ecig2quit ecig2quit_rec
#> 3 ecig2symp ecig2symp_rec
#> 
#> $counts
#> $counts[[1]]
#>         ecig2crav ecig2crav_rec n
#> 1 Not at all true      Not true 1
#> 2   Not very true      Not true 1
#> 3   Somewhat true          True 1
#> 4       Very true          True 1
#> 
#> $counts[[2]]
#>       ecig2quit ecig2quit_rec n
#> 1 Not very true      Not true 1
#> 2 Somewhat true          True 1
#> 3     Very true          True 1
#> 4          <NA>          <NA> 1
#> 
#> $counts[[3]]
#>         ecig2symp ecig2symp_rec n
#> 1 Not at all true      Not true 1
#> 2   Not very true      Not true 1
#> 3   Somewhat true          True 1
#> 4       Very true          True 1

Created on 2020-06-05 by the reprex package (v0.3.0)

1 Like

Hi Phiggins,

This worked perfectly. Thank very much!

Polly

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.