Working with a large dataset (> 1 million rows), I want to blank out a quality field if, for all occurrences of a given value of a key field, the quality field has a certain value ("bad" say).
I found a way to do it, but to me, it seems a bit inelegant to create and destroy variables in a dataframe.
Is there a better way?
By the way, if_else was a lifesaver in helping to debug what was going on. ifelse is a trap.
# If all values of key are bad, set value to blank, otherwise leave alone
reprexdata <- tribble(
~key, ~value,
"a" , "good",
"a" , "bad",
"a" , "good",
"b" , "bad",
"b" , "bad",
"b" , "bad",
"c" , "good",
"c" , "good",
"c" , "good"
)
# This fails because it clobbers the bad value for key=="a" with ifelse, or
# refuses to run using if_else
reprexdata %>% group_by(key) %>%
group_modify(~ {.x %>% mutate(
value=if_else(nrow(.x)==sum(value=="bad", na.rm=TRUE),
"",
value))})
# This works, but seems a bit inelegant
reprexdata %>% group_by(key) %>%
group_modify(~ {.x %>% mutate(Number=nrow(.x),
Numbad=sum(value=="bad"))}) %>%
mutate(value=if_else(Number==Numbad,
"",
value)) %>%
select(-c(Number, Numbad))