How to mutate multiple variables based on condition?

Here I am giving a survey dataset, where the response is recorded. I want a column to come beside which shows the score.
library(tidyverse)
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
nithin<-tibble::tribble(
~enumerator, ~l1c1, ~l1c2, ~l1c3, ~l1c4,
"PEN001", 1L, -99L, 1L, 1L,
"PEN002", 0L, 0L, 0L, 1L,
"PEN001", -99L, 0L, 1L, 0L,
"PEN001", 1L, 0L, -99L, 1L
)

The wanted result will come like this:

nithin2<-tibble::tribble(
~enumerator, ~l1c1, ~q1_score, ~l1c2, ~q2_score, ~l1c3, ~q3_score, ~l1c4, ~q4_score,
"PEN001", 1L, 2L, -99L, 0L, 1L, 2L, 1L, 2L,
"PEN002", 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L,
"PEN001", -99L, 0L, 0L, 0L, 1L, 2L, 0L, 0L,
"PEN001", 1L, 2L, 0L, 0L, -99L, 0L, 1L, 2L
)

It is not clear how you want to mutate the new variables, i.e. based on what condition - how is the score calculated?

From my perspective, it looks like when the column is 1, the score is 2. When the column is anything else, the score is zero. This can be achieved using dplyr:

nithin %>%
  mutate(across(-enumerator, 
                ~ if_else(.x == 1, 2, 0), 
                .names = "{.col}_score"))

This code is saying:

  • Mutate every column besides "enumerator".
  • Use the if_else function to check if the cell is 1 - if so label 2, otherwise label 0.
  • Don't mutate the columns themselves, create new columns with the name, e.g., "l1c1_score"

If you want the score column next to the value column, you can add this to the end of the pipeline:

 %>%  .[,order(colnames(.))]

the score has been set by the team that prepared the survey questionnaire. There is no underlying logic behind it.

Thanks. This is super-cool. But two things here:

  1. If the response is -99, can the values in the next column be "No response"?
  2. And what if the system of scoring is non-uniform. in the reprex above, the scoring is 2 marks for right answer, but in some questions, it will be 1.5 marks for the right answer (i.e, when the response is 1)?

You can't really mix-and-match data types in R data frame columns, but you could record it as "missing" by doing something like:

nithin <- tibble::tribble(
  ~enumerator, ~l1c1, ~l1c2, ~l1c3, ~l1c4,
  "PEN001", 1L, -99L, 1L, 1L,
  "PEN002", 0L, 0L, 0L, 1L,
  "PEN003", -99L, 0L, 1L, 0L,
  "PEN004", 1L, 0L, -99L, 1L
)

nithin %>%
  mutate(across(-enumerator, 
                ~ case_when(.x == 1 ~ 2, 
                            .x == -99 ~ NA_real_,
                            TRUE ~ 0), 
                .names = "{.col}_score")
         )  %>%  
  .[,order(colnames(.))]
  enumerator  l1c1 l1c1_score  l1c2 l1c2_score  l1c3 l1c3_score  l1c4 l1c4_score
  <chr>      <int>      <dbl> <int>      <dbl> <int>      <dbl> <int>      <dbl>
1 PEN001         1          2   -99         NA     1          2     1          2
2 PEN002         0          0     0          0     0          0     1          2
3 PEN003       -99         NA     0          0     1          2     0          0
4 PEN004         1          2     0          0   -99         NA     1          2

In this situation, you might have to change tactics entirely and do a bit of data reshaping and binding:

marks = tribble(
  ~question, ~marks,
  "l1c1", 2,
  "l1c2", 1.5,
  "l1c3", 2,
  "l1c4", 0.5
)

nithin %>% 
  pivot_longer(-enumerator, names_to = "question") %>% 
  left_join(marks) %>% 
  mutate(points = case_when(
    value == 1 ~ marks,
    value == -99 ~ NA_real_,
    TRUE ~ 0
  )) %>% 
  select(-marks) %>% 
  pivot_wider(names_from = question, 
              values_from = c(value, points), 
              names_glue = "{question}_{.value}") %>%  
  .[,order(colnames(.))]
  enumerator l1c1_points l1c1_value l1c2_points l1c2_value l1c3_points l1c3_value l1c4_points l1c4_value
  <chr>            <dbl>      <int>       <dbl>      <int>       <dbl>      <int>       <dbl>      <int>
1 PEN001               2          1          NA        -99           2          1         0.5          1
2 PEN002               0          0           0          0           0          0         0.5          1
3 PEN003              NA        -99           0          0           2          1         0            0
4 PEN004               2          1           0          0          NA        -99         0.5          1

This looks good. But can it come the other way, like the value comes first and then the points?

That's just an alphabetical thing; could change it to something like:

nithin %>% 
  pivot_longer(-enumerator, names_to = "question") %>% 
  left_join(marks) %>% 
  mutate(points = case_when(
    value == 1 ~ marks,
    value == -99 ~ NA_real_,
    TRUE ~ 0
  )) %>% 
  select(-marks) %>% 
  pivot_wider(names_from = question, 
              values_from = c(value, points), 
              names_glue = "{question}_{.value}") %>%  
  rename_with(~str_remove_all(.x, "_value")) %>% 
  .[,order(colnames(.))]
# A tibble: 4 x 9
  enumerator  l1c1 l1c1_points  l1c2 l1c2_points  l1c3 l1c3_points  l1c4 l1c4_points
  <chr>      <int>       <dbl> <int>       <dbl> <int>       <dbl> <int>       <dbl>
1 PEN001         1           2   -99          NA     1           2     1         0.5
2 PEN002         0           0     0           0     0           0     1         0.5
3 PEN003       -99          NA     0           0     1           2     0         0  
4 PEN004         1           2     0           0   -99          NA     1         0.5

This is great. Thanks a lot

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.