I am trying to add a new column to my dataframe that displays either a red, yellow, or green "status" based upon the existence of a value within each group.
library(dplyr)
a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)
> df
Group Value
1 A 90
2 A 80
3 B 90
4 B 90
5 C 90
6 C 70
If any value in a group is less than 80 I want each row of RYG to be red; if any value in a group is between 80 and 90 I want it to be yellow, and anything 90 or greater is green. I am trying to use any() to accomplish this but it seems to be overriding the group_by:
Warning messages:
1: In any(Value, na.rm = TRUE) :
coercing argument of type 'double' to logical
> df_2
# A tibble: 6 x 3
# Groups: Group [3]
Group Value RYG
<fct> <dbl> <chr>
1 A 90 RED
2 A 80 RED
3 B 90 RED
4 B 90 RED
5 C 90 RED
6 C 70 RED
Expected outcome is:
> df_2
# A tibble: 6 x 3
# Groups: Group [3]
Group Value RYG
<fct> <dbl> <chr>
1 A 90 YELLOW
2 A 80 YELLOW
3 B 90 GREEN
4 B 90 GREEN
5 C 90 RED
6 C 70 RED
I think in your problem logic, you are missing calculating a statistic based on values per group. As you stated the problem, why would group C be red instead of green? Based on your expected outcomes, it seems like you want to use the minimum Value per Group to determine the color.
I suggest trying out the dplyr::case_when function, combined with dplyr::between:
a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2
df2 <- df %>%
group_by(Group) %>%
mutate(RYG = case_when(
min(Value) < 80 ~ "red",
between(min(Value), 80, 89) ~ "yellow",
TRUE ~ "green"
))
df2
#> # A tibble: 6 x 3
#> # Groups: Group [3]
#> Group Value RYG
#> <fct> <dbl> <chr>
#> 1 A 90 yellow
#> 2 A 80 yellow
#> 3 B 90 green
#> 4 B 90 green
#> 5 C 90 red
#> 6 C 70 red
zero or more logical vectors. Other objects of zero length are ignored, and the rest are coerced to logical ignoring any class.
As I mentioned earlier, you need an explicit trumping rule for when any value in a group is < 80 and any value is ALSO > 90 for example. The code below uses case_when in this way, but the rule is implicit in my ordering of your arguments, so red trumps yellow which trumps green. This essentially corresponds to applying a "min" to each value per group (which is what I did to create df2; for this reason, I think this code is clearer about what you are trying to accomplish, which was why I suggested it initially).
a <- c("A", "A", "B", "B", "C", "C")
b <- c(90,80,90,90,90,70)
df <- data.frame("Group" = a, "Value" = b)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2
df2 <- df %>%
group_by(Group) %>%
mutate(RYG = case_when(
min(Value) < 80 ~ "red",
between(min(Value), 80, 89) ~ "yellow",
TRUE ~ "green"
))
df3 <- df %>%
group_by(Group) %>%
mutate(RYG = case_when(
any(Value < 80, na.rm = TRUE) ~ "red",
any(Value >= 80, na.rm = TRUE) & any(Value <= 89, na.rm = TRUE) ~ "yellow",
TRUE ~ "green"
))
df3
#> # A tibble: 6 x 3
#> # Groups: Group [3]
#> Group Value RYG
#> <fct> <dbl> <chr>
#> 1 A 90 yellow
#> 2 A 80 yellow
#> 3 B 90 green
#> 4 B 90 green
#> 5 C 90 red
#> 6 C 70 red
all_equal(df2, df3)
#> [1] TRUE