# Using case_when() over multiple columns

``````library(dplyr, warn = FALSE)

set.seed(1)

values <- c("1+1", "1+2", "1+3", "1+4", "1+5", "1+6")
Df <- tibble(ab = sample(values, 10, replace = TRUE),
ba = sample(values, 10, replace = TRUE),
cd = sample(values, 10, replace = TRUE),
dc = sample(values, 10, replace = TRUE),
de = sample(values, 10, replace = TRUE),
ed = sample(values, 10, replace = TRUE))

Df
#> # A tibble: 10 x 6
#>    ab    ba    cd    dc    de    ed
#>    <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 1+2   1+2   1+6   1+3   1+5   1+3
#>  2 1+3   1+2   1+2   1+4   1+4   1+6
#>  3 1+4   1+5   1+4   1+3   1+5   1+3
#>  4 1+6   1+3   1+1   1+2   1+4   1+2
#>  5 1+2   1+5   1+2   1+5   1+4   1+1
#>  6 1+6   1+3   1+3   1+5   1+5   1+1
#>  7 1+6   1+5   1+1   1+5   1+1   1+2
#>  8 1+4   1+6   1+3   1+1   1+3   1+4
#>  9 1+4   1+3   1+6   1+5   1+5   1+4
#> 10 1+1   1+5   1+3   1+3   1+5   1+3
``````

Created on 2018-10-29 by the reprex package (v0.2.1)

I need to change values in some data like this based on logical conditions. I could use a separate `case_when` for each separate column, but the real dataset has alot of columns so I'd prefer not to...The output should be numeric, example for column `ab`:

``````Df %>%
select(ab) %>%
mutate(ab = case_when(
ab == "1+1" | ab == "1+2" ~ 1,
ab == "1+3" | ab == "1+4" ~ 2,
ab == "1+5" | ab == "1+6" ~ 3
))
``````

Created on 2018-10-29 by the reprex package (v0.2.1)

I guess this is a job for purrr (or some scoped dplyr verb??), but I can't really figure out where the case_when() would fit in this context, especially since the logicals are different because of different column names. Any help is greatly appreciated!

Hey there, @bragks! Thankfully, you don't need to use `case_when()` separately for each column, and you don't need to use `purrr` to iterate over the columns. Instead, as you suspected, you can use the scoped variants of `mutate`: `mutate_if()` and `mutate_at()`

• `mutate_if()` lets you mutate all columns that return `TRUE` for a predicate function that you supply;
• `mutate_at()` lets you specify the columns you want to mutate using helpers like `starts_with()` or `one_of()`; and
• `mutate_all()` lets you mutate... all columns

When you use these scoped variants, you wrap the mutation function inside `funs()`, and then you have access to `.` to specify the original column. For example:

``````Df %>%
mutate_at(
vars(one_of('ab', 'de')),
funs(case_when(
. == "1+1" | . == "1+2" ~ 1,
. == "1+3" | . == "1+4" ~ 2,
. == "1+5" | . == "1+6" ~ 3)))
#> # A tibble: 10 x 6
#>       ab ba    cd    dc       de ed
#>    <dbl> <chr> <chr> <chr> <dbl> <chr>
#>  1     1 1+2   1+6   1+3       3 1+3
#>  2     2 1+2   1+2   1+4       2 1+6
#>  3     2 1+5   1+4   1+3       3 1+3
#>  4     3 1+3   1+1   1+2       2 1+2
#>  5     1 1+5   1+2   1+5       2 1+1
#>  6     3 1+3   1+3   1+5       3 1+1
#>  7     3 1+5   1+1   1+5       1 1+2
#>  8     2 1+6   1+3   1+1       2 1+4
#>  9     2 1+3   1+6   1+5       3 1+4
#> 10     1 1+5   1+3   1+3       3 1+3

Df %>%
mutate_all(
funs(case_when(
. == "1+1" | . == "1+2" ~ 1,
. == "1+3" | . == "1+4" ~ 2,
. == "1+5" | . == "1+6" ~ 3)))
#> # A tibble: 10 x 6
#>       ab    ba    cd    dc    de    ed
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1     1     3     2     3     2
#>  2     2     1     1     2     2     3
#>  3     2     3     2     2     3     2
#>  4     3     2     1     1     2     1
#>  5     1     3     1     3     2     1
#>  6     3     2     2     3     3     1
#>  7     3     3     1     3     1     1
#>  8     2     3     2     1     2     2
#>  9     2     2     3     3     3     2
#> 10     1     3     2     2     3     2
``````

I hope that helps!

4 Likes

Just a heads-up about this: `funs()` are going to be deprecated in near future - https://github.com/tidyverse/dplyr/pull/3930. You can still use exactly the same approach, but your functions need to be wrapped into `list()` instead.

2 Likes

Thank you, @rensa! That `.` keeps dropping out of my memory.

I see that I forgot one part of my question:
After changing the values from each column, I need to add a new column containing the column NAME of the max value(s) for each observation. E.g. from your tibble this would be c("cd", "de") for obs1 and "ab" for obs3. Any ideas? I suspect this might not be as straight forward your first solution.

That's okay! Do you mean to say that you need a single new column containing the row-wise maxima (each element of the new column contains the maximum of that row)? If so, here's a bit of a wild way to do it (basically lifted from this SO answer, although I don't know whether I'm making things less safe by turning it into a one-liner with `names(.)`):

``````Df %>%
mutate_all(
funs(case_when(
. == "1+1" | . == "1+2" ~ 1,
. == "1+3" | . == "1+4" ~ 2,
. == "1+5" | . == "1+6" ~ 3))) %>%
mutate(rowmax = pmax(!!!rlang::syms(names(.))))
#> # A tibble: 10 x 7
#>       ab    ba    cd    dc    de    ed rowmax
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1     1     1     3     2     3     2      3
#>  2     2     1     1     2     2     3      3
#>  3     2     3     2     2     3     2      3
#>  4     3     2     1     1     2     1      3
#>  5     1     3     1     3     2     1      3
#>  6     3     2     2     3     3     1      3
#>  7     3     3     1     3     1     1      3
#>  8     2     3     2     1     2     2      3
#>  9     2     2     3     3     3     2      3
#> 10     1     3     2     2     3     2      3
``````

Ideally you'd replace `rlang::syms(names(.))` with something else that turns the aforementioned scoping functions into a list that can be unspliced in the same way, so that you can use this even if you have other columns that you want to exclude. But my `tidyeval` knowledge isn't quiiiiiiite there yet

Thank you again! Almost that, but not exactly, I had a look at that SO answer, but it's not exactly what I need. I don't know if this method is "correct", but I've almost manged to get what I need with dplyr, stringr and tidyr.

``````library(dplyr, warn = FALSE)
library(tidyr)
library(stringr)
set.seed(1)

values <- c("1+1", "1+2", "1+3", "1+4", "1+5", "1+6")
Df <- tibble(
ab = sample(values, 10, replace = TRUE),
ba = sample(values, 10, replace = TRUE),
cd = sample(values, 10, replace = TRUE),
dc = sample(values, 10, replace = TRUE),
de = sample(values, 10, replace = TRUE),
ed = sample(values, 10, replace = TRUE))

Df <- Df %>%
mutate_all(
funs(case_when(
. == "1+1" | . == "1+2" ~ 1,
. == "1+3" | . == "1+4" ~ 2,
. == "1+5" | . == "1+6" ~ 3))) %>%
mutate(id = 1:10)

Df %>% gather(key = "varname", value = "values", ab:ed) %>%
mutate(has_letter = case_when(
str_detect(varname, "a") ~ "has_a",
str_detect(varname, "c") ~ "has_c",
str_detect(varname, "e") ~ "has_e"
)) %>%
group_by(id, has_letter) %>%
summarise(maxval = max(values)) %>%
spread(key = has_letter, value = maxval) %>%
#Is it possible optimize the following case_when so I don't have to write out all possible conditions?
#I need the NAME of the column with the highest value, if more than one, I can combine, e.g "ac", "ce".
#The original dataset also has NA in some cols, so I would need a lot of different conditions with this method (I think...)
mutate(maxcolname = case_when(
has_a == has_c & has_a == has_e ~ "equal",
has_a > has_c & has_a > has_e ~ "a"
))

``````