How to use any() within mutate and group_by

dplyr

#1

I am trying to add a new column to my dataframe that displays either a red, yellow, or green "status" based upon the existence of a value within each group.

library(dplyr)

a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)

> df
  Group Value
1     A    90
2     A    80
3     B    90
4     B    90
5     C    90
6     C    70

If any value in a group is less than 80 I want each row of RYG to be red; if any value in a group is between 80 and 90 I want it to be yellow, and anything 90 or greater is green. I am trying to use any() to accomplish this but it seems to be overriding the group_by:

df_2 <- df %>% 
  group_by(Group) %>% 
  mutate(RYG = if_else(any(Value, na.rm = TRUE) < 90 & any(Value, na.rm = TRUE) >= 80, "YELLOW", if_else(any(Value, na.rm = TRUE) < 80, "RED", "GREEN")))
Warning messages:
1: In any(Value, na.rm = TRUE) :
  coercing argument of type 'double' to logical
> df_2
# A tibble: 6 x 3
# Groups:   Group [3]
  Group Value RYG  
  <fct> <dbl> <chr>
1 A        90 RED  
2 A        80 RED  
3 B        90 RED  
4 B        90 RED  
5 C        90 RED  
6 C        70 RED

Expected outcome is:

> df_2
# A tibble: 6 x 3
# Groups:   Group [3]
  Group Value RYG  
  <fct> <dbl> <chr>
1 A        90 YELLOW  
2 A        80 YELLOW  
3 B        90 GREEN  
4 B        90 GREEN 
5 C        90 RED  
6 C        70 RED

#2

I think in your problem logic, you are missing calculating a statistic based on values per group. As you stated the problem, why would group C be red instead of green? Based on your expected outcomes, it seems like you want to use the minimum Value per Group to determine the color.

I suggest trying out the dplyr::case_when function, combined with dplyr::between:

a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2

df2 <- df %>% 
  group_by(Group) %>% 
  mutate(RYG = case_when(
    min(Value) < 80 ~ "red",
    between(min(Value), 80, 89) ~ "yellow",
    TRUE ~ "green"
  ))

df2
#> # A tibble: 6 x 3
#> # Groups:   Group [3]
#>   Group Value RYG   
#>   <fct> <dbl> <chr> 
#> 1 A        90 yellow
#> 2 A        80 yellow
#> 3 B        90 green 
#> 4 B        90 green 
#> 5 C        90 red   
#> 6 C        70 red

Created on 2019-01-29 by the reprex package (v0.2.1)


#3

Yes this solution works to get the expected outcome. But I am just trying to ask a more general question about why any() would not work in this case.


#4

Any needs a logical: https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/any

Usage
any(ā€¦, na.rm = FALSE)
Arguments

ā€¦

zero or more logical vectors. Other objects of zero length are ignored, and the rest are coerced to logical ignoring any class.

As I mentioned earlier, you need an explicit trumping rule for when any value in a group is < 80 and any value is ALSO > 90 for example. The code below uses case_when in this way, but the rule is implicit in my ordering of your arguments, so red trumps yellow which trumps green. This essentially corresponds to applying a "min" to each value per group (which is what I did to create df2; for this reason, I think this code is clearer about what you are trying to accomplish, which was why I suggested it initially).

a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2

df2 <- df %>% 
  group_by(Group) %>% 
  mutate(RYG = case_when(
    min(Value) < 80 ~ "red",
    between(min(Value), 80, 89) ~ "yellow",
    TRUE ~ "green"
  ))

df3 <- df %>% 
  group_by(Group) %>% 
  mutate(RYG = case_when(
    any(Value < 80, na.rm = TRUE) ~ "red",
    any(Value >= 80, na.rm = TRUE) & any(Value <= 89, na.rm = TRUE) ~ "yellow",
    TRUE ~ "green"
  ))

df3
#> # A tibble: 6 x 3
#> # Groups:   Group [3]
#>   Group Value RYG   
#>   <fct> <dbl> <chr> 
#> 1 A        90 yellow
#> 2 A        80 yellow
#> 3 B        90 green 
#> 4 B        90 green 
#> 5 C        90 red   
#> 6 C        70 red

all_equal(df2, df3)
#> [1] TRUE

Created on 2019-01-30 by the reprex package (v0.2.1)


#5

Ahh I see. I needed to move my '<' and '>=' inside of any(). Thanks.

df_2 <- df %>% 
  group_by(Group) %>% 
  mutate(RYG = if_else(any(Value < 90, na.rm = TRUE)  & any(Value >= 80, na.rm = TRUE), "YELLOW", if_else(any(Value < 80, na.rm = TRUE), "RED", "GREEN")))

closed #6

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.