map inside of case_when

ttrodrigz · October 6, 2021, 7:03pm

I am trying to use case_when to call different functions inside map on list-columns depending on the condition. For example, say I have two list-columns, the first element in each of the lists contains a vector of numbers, and the second contains a vector of characters.

test.df <- tibble(
    has_numbers = c(TRUE, FALSE),
    a = list(1:3, "foo"),
    b = list(4:6, "bar")
)

> test.df
# A tibble: 2 x 3
  has_numbers a         b        
  <lgl>       <list>    <list>   
1 TRUE        <int [3]> <int [3]>
2 FALSE       <chr [1]> <chr [1]>

I want to create a new list-column where if the row contains numbers it adds them up, otherwise it pastes the strings together. I would approach this problem by using a case_when inside mutate, and mapping the relevant function over the list columns based on the result of case_when.

test.df %>%
    mutate(result = case_when(
        has_numbers ~ map2(a, b, sum),
        !has_numbers ~ map2(a, b, paste)
    ))

However, it looks like the call to sum is being attempted whether or not the case_when evaluates to TRUE, as it returns this error:

Error: Problem with `mutate()` column `result`.
i `result = case_when(...)`.
x invalid 'type' (character) of argument

This error is the same error you would get when trying to add two characters, so I know that the map2 call with add is being attempted for each row of the data regardless of what case_when evaluates to.

> sum("foo", "bar")
Error in sum("foo", "bar") : invalid 'type' (character) of argument

Any help/ideas are appreciated, thanks!

jrmuirhead · October 6, 2021, 7:41pm

Hi @ttrodrigz
One approach I would take is to just use one function, but move the branching (has_numbers is TRUE vs FALSE) inside the function.

library("tidyverse")
  
test.df <- tibble(
      has_numbers = c(TRUE, FALSE),
      a = list(1:3, "foo"),
      b = list(4:6, "bar")
  )
  
make_result <- function(has_numbers, x, y){
      output <- ifelse(isTRUE(has_numbers), sum(x, y, na.rm = TRUE),
        paste(x, y))
}
  
gg <- test.df %>%
    mutate(result = pmap(.l = list(has_numbers, a, b), ~make_result(..1, ..2, ..3)))
#> # A tibble: 2 × 4
#>   has_numbers a         b         result   
#>   <lgl>       <list>    <list>    <list>   
#> 1 TRUE        <int [3]> <int [3]> <int [1]>
#> 2 FALSE       <chr [1]> <chr [1]> <chr [1]>

^{Created on 2021-10-06 by the reprex package (v2.0.1)}

arthur.t · October 7, 2021, 2:19am

Here's a different solution with inset

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.0.5
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

test.df <- tibble(
  has_numbers = c(TRUE, FALSE),
  a = list(1:3, "foo"),
  b = list(4:6, "bar")
)

test.df <- test.df %>%
  mutate(
    result = as.list(rep(0, nrow(.))), # initialize
    result = inset(result,  has_numbers, map2(a[ has_numbers], b[ has_numbers], sum)),
    result = inset(result, !has_numbers, map2(a[!has_numbers], b[!has_numbers], paste))
  ) 

test.df$result
#> [[1]]
#> [1] 21
#> 
#> [[2]]
#> [1] "foo bar"

^{Created on 2021-10-06 by the reprex package (v1.0.0)}

ttrodrigz · October 7, 2021, 4:45pm

Thanks for this! This is also the route I have taken in the past to work through this kind of task.

I am still wondering, though, why case_when isn't behaving as I would have expected. I will mark this as a solution since it does solve the problem, but curious to know if anyone can shed light on why case_when is operating as it is.

arthur.t · October 7, 2021, 4:54pm

case_when (as I understand it) will evaluate the expression for each case on the entire vector. It isn't smart enough to only evaluate on the subset of vector elements specific to each case.

This limitation also applies to if_else.

There is another option: map_if which can be used to operate only on subset of elements. However there is no map2_if or pmap_if so it has this different sort of limitation.

What then to do? These are the possibilities I can see.

Write a function that detects a condition of the element and returns the appropriate result ( jrmuirhead's solution)
Use map_if in a way that multiple input arguments are packed into list of lists and the function pulls the individual input arguments out of the list
Assign in a "base R" way like df$result[condition] <- fn(df$input[condition])
Assign in a similar way with magrittr::inset, which is more pipe-friendly

system · October 14, 2021, 4:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.