Multistage Recoding with dplyr and across

dgtully · June 8, 2021, 3:42am

Hello, all.

I'm trying to create a function to automatically collapse labelled 4-point survey scales to binary variables and relabel them. Here's the test data I'm working with:

data <- bind_cols(favA = sample(c(1:4, 98, 99, 200), 10, replace = T), 
                  favB = sample(c(1:4, 98, 99, 200), 10, replace = T),
                  appC = sample(c(1:4, 98, 99, 200), 10, replace = T)
)

I want to select a subset of the variables, recode them to make sure they are standard, apply labels, and then recode them again collapsing the 4-point scale to a 2-point scale, and then applying labels to the new variables. I took the syntax that I use to do it outside a function and tried to translate it to a function. I came up with this:

bicode <- function(df, vars) {
  df %>% 
    mutate(across({{vars}}, 
            .fns = list(
             ~dplyr::recode(., `1` = 1L, `2` = 2L, `3` = 3L, `4` = 4L,  `98` = 99L, `99` = 99L, .default = NA_integer_), 
             ~haven::labelled(., labels = c("Very unfavorable" = 1L, "Somewhat unfavorable" = 2L, "Somewhat favorable" = 3L, "Very favorable" = 4L, "DK/NA" = 99L))),
                  .names = "{.col}")) %>%
    mutate(across({{vars}},
            .fns = list(~dplyr::recode(., `1` = 1L, `2` = 1L, `3` = 2L, `4` = 2L,  `98` = 99L, `99` = 99L, .default = NA_integer_), 
            ~haven::labelled(., labels = c("Unfavorable" = 1L, "Favorable" = 2L, "DK/NA" = 99L))), 
                  .names = "{.col}2"))
  }

When I run any of the following, I get an error telling me that I have duplicate column names, as below

data %>% bicode(vars(favA))
data %>% bicode(vars(favA:favB))
data %>% bicode(starts_with('fav'))

 Error: Problem with `mutate()` input `..1`.
ℹ `..1 = across(...)`.
x Names must be unique.
x These names are duplicated:
  * "favA2" at locations 1 and 2.
  * "favB2" at locations 3 and 4.
  * "favA_12" at locations 5 and 6.
  * "favA_22" at locations 7 and 8.
  * "favB_12" at locations 9 and 10.
  * ...
Run `rlang::last_error()` to see where the error occurred.

What do I need to do to a) replace the original columns with the first transformation and b) return new, unique columns with the second? Ideally, the output would look like this:

# A tibble: 10 x 5
     favA         favB      appC   favA2      favB2    
     <dbl+lbl>  <dbl+lbl>   <int> <dbl+lbl>   <dbl+lbl> 
 1     3 [Some…  200           2    2 [Favo…   NA
 2     4 [Very…   99 [DK/…    98    2 [Favo…   99 [DK/…
 3     1 [Very…    1 [Ver…     1    1 [Unfa…    1 [Unfa…
 4    99 [DK/N…    3 [Som…    99   99 [DK/…     2 [Favo…
 5     3 [Some…    4 [Ver…     2    2 [Favo…    2 [Favo…
 6    99 [DK/N…   NA         200   99 [DK/…    NA
 7     2 [Some…    4 [Ver…     1    1 [Unfa…    2 [Favo…
 8     4 [Very…    4 [Ver…     2    2 [Favo…    2 [Favo…
 9    99 [DK/N…    1 [Ver…    99   99 [DK/…     1 [Unfa…
10    99 [DK/N…    2 [Som…    99   99 [DK/…     1 [Unfa…

Any help would be much apprciated!

UPDATE: The first answer below helped me clarify my question:

I don't understand how to get the first mutate command to return the original variables rather than a new variable at var1. I thought by adding .names = "{.col}", the new variables would replace the old ones. Then in the second mutate, it would select the same variables it just mutated and then mutate them into new variables "var2" because .names = "{.col}2".

Thanks,

David

Rsky · June 8, 2021, 7:06am

In your code, the first "mutate" may have triggered the next "mutate" on the newly generated column.

I am not confident that I understand your algorithm correctly.
but There was no error.

data %>%
  select(c('favA','favB')) %>% 
  mutate(across(c('favA','favB'), 
                .fns = list(recode=dplyr::recode),
                `1` = 1, 
                `2` = 2, 
                `3` = 3, 
                `4` = 4,
                `98` = 99, 
                `99` = 99,
                .names = "{.col}_1")) %>% 
  mutate(across(c('favA','favB'),
                .fns=list(haven=haven::labelled),
                labels = c("Very unfavorable" = 1, 
                           "Somewhat unfavorable" = 2, 
                           "Somewhat favorable" = 3, 
                           "Very favorable" = 4, 
                           "DK/NA" = 99),
                .names = "{.col}_2"))

    favA  favB favA_1 favB_1             favA_2             favB_2
   <dbl> <dbl>  <dbl>  <dbl>          <dbl+lbl>          <dbl+lbl>
 1     2     1      2      1   2 [Somewhat unf~   1 [Very unfavor~
 2   200     3    200      3 200                  3 [Somewhat fav~
 3   200   200    200    200 200                200

The across function's options puts outside.

f<-function(option=NULL){}
across(col_name,f,option=)

If you refer to tidyselect for specifying column names, you will be able to write code that works well with ACROSS.

dgtully · June 8, 2021, 1:32pm

Thanks. your code definitely works, but I want to make the function a little more generic.

I guess the first problem is that I don't understand how to get the first mutate command to return the original variables rather than a new variable at var_1. I thought by adding .names = "{.col}", the new variables would replace the old ones. Then in the second mutate, it would select the same variables it just mutated and then mutate them into new variables "var2" because .names = "{.col}2".

dgtully · June 8, 2021, 1:43pm

The code's not cute, but by separating each command, I get what I want:

bicode <- function(df, vars) {
  df <- df %>% 
    mutate(across({{vars}},  .fns = list(~dplyr::recode(., `1` = 1L, `2` = 2L, `3` = 3L, `4` = 4L,  `98` = 99L, `99` = 99L, .default = NA_integer_)) ,
                    .names = "{.col}"))
  df <- df %>% 
    mutate(across({{vars}}, .fns = list(~haven::labelled(., labels = c("Very unfavorable" = 1L, "Somewhat unfavorable" = 2L, "Somewhat favorable" = 3L, "Very favorable" = 4L, "DK/NA" = 99L))),
                  .names = "{.col}"))
  df <- df %>% 
    mutate(across({{vars}}, .fns = list(~dplyr::recode(., `1` = 1L, `2` = 1L, `3` = 2L, `4` = 2L,  `98` = 99L, `99` = 99L, .default = NA_integer_)), 
                  .names = "{.col}2"))
  df <- df %>% 
    mutate(across({{vars}} & ends_with('2'), .fns = list(~haven::labelled(., labels = c("Unfavorable" = 1L, "Favorable" = 2L, "DK/NA" = 99L))), .names = "{.col}"))
  df
}

data %>% bicode(starts_with('fav'))

# A tibble: 10 x 5
                        favA                      favB  appC            favA2            favB2
                   <int+lbl>                 <int+lbl> <dbl>        <int+lbl>        <int+lbl>
 1 99 [DK/NA]                 1 [Very unfavorable]         2 99 [DK/NA]        1 [Unfavorable]
 2 NA                        NA                            1 NA               NA              
 3 99 [DK/NA]                 3 [Somewhat favorable]       4 99 [DK/NA]        2 [Favorable]  
 4 NA                         1 [Very unfavorable]       200 NA                1 [Unfavorable]
 5 99 [DK/NA]                99 [DK/NA]                    1 99 [DK/NA]       99 [DK/NA]      
 6  1 [Very unfavorable]     99 [DK/NA]                    2  1 [Unfavorable] 99 [DK/NA]      
 7  4 [Very favorable]       NA                            4  2 [Favorable]   NA              
 8  1 [Very unfavorable]     99 [DK/NA]                    1  1 [Unfavorable] 99 [DK/NA]      
 9  2 [Somewhat unfavorable]  2 [Somewhat unfavorable]     3  1 [Unfavorable]  1 [Unfavorable]
10 NA                         2 [Somewhat unfavorable]     1 NA                1 [Unfavorable]

system · June 15, 2021, 1:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.