Looking for better way to do if_else inside group_modify

Working with a large dataset (> 1 million rows), I want to blank out a quality field if, for all occurrences of a given value of a key field, the quality field has a certain value ("bad" say).

I found a way to do it, but to me, it seems a bit inelegant to create and destroy variables in a dataframe.

Is there a better way?

By the way, if_else was a lifesaver in helping to debug what was going on. ifelse is a trap.


#   If all values of key are bad, set value to blank, otherwise leave alone
reprexdata <- tribble(
  ~key, ~value,
  "a" , "good",
  "a" , "bad",
  "a" , "good",
  "b" , "bad",
  "b" , "bad",
  "b" , "bad",
  "c" , "good",
  "c" , "good",
  "c" , "good"
)

# This fails because it clobbers the bad value for key=="a" with ifelse, or
# refuses to run using if_else
reprexdata %>% group_by(key) %>% 
  group_modify(~ {.x %>% mutate(
    value=if_else(nrow(.x)==sum(value=="bad", na.rm=TRUE), 
                 "", 
                 value))})

# This works, but seems a bit inelegant
reprexdata %>% group_by(key) %>% 
   group_modify(~ {.x %>% mutate(Number=nrow(.x), 
                                 Numbad=sum(value=="bad"))}) %>% 
  mutate(value=if_else(Number==Numbad, 
                      "", 
                      value)) %>% 
  select(-c(Number, Numbad))

I think creating a temporary flag variable is easier

library(tidyverse)

reprexdata <- tribble(
    ~key, ~value,
    "a" , "good",
    "a" , "bad",
    "a" , "good",
    "b" , "bad",
    "b" , "bad",
    "b" , "bad",
    "c" , "good",
    "c" , "good",
    "c" , "good"
)

reprexdata %>% 
    group_by(key) %>%
    mutate(flag = if_else(all(value == "bad"), TRUE, FALSE)) %>% 
    ungroup() %>% 
    mutate(value = if_else(flag, "", value)) %>% 
    select(-flag)
#> # A tibble: 9 x 2
#>   key   value 
#>   <chr> <chr> 
#> 1 a     "good"
#> 2 a     "bad" 
#> 3 a     "good"
#> 4 b     ""    
#> 5 b     ""    
#> 6 b     ""    
#> 7 c     "good"
#> 8 c     "good"
#> 9 c     "good"


Created on 2020-01-17 by the reprex package (v0.3.0.9000)

I think you can use the power of case_when which is vectorized on RHS and LHS. Applied within a mutate on a grouped data gives what you want.

library(dplyr)
reprexdata <- tribble(
  ~key, ~value,
  "a" , "good",
  "a" , "bad",
  "a" , "good",
  "b" , "bad",
  "b" , "bad",
  "b" , "bad",
  "c" , "good",
  "c" , "good",
  "c" , "good"
)

reprexdata %>% 
  group_by(key) %>% 
  mutate(value = case_when(
    all(value == "bad") ~ "",
    TRUE ~ value
  )) %>%
  ungroup()
#> # A tibble: 9 x 2
#>   key   value 
#>   <chr> <chr> 
#> 1 a     "good"
#> 2 a     "bad" 
#> 3 a     "good"
#> 4 b     ""    
#> 5 b     ""    
#> 6 b     ""    
#> 7 c     "good"
#> 8 c     "good"
#> 9 c     "good"

Created on 2020-01-17 by the reprex package (v0.3.0.9001)

3 Likes

Sweet. I tested and it even works if one of the values is NA, which I should have included in the reprex.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.