How to use group_by function for two variables?

sharmachetan · December 17, 2021, 8:11am

I was wondering if someone can please help me understand how should I put two variables (names, specie) here in the code.
Original code is below:

df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names) %>% 
  mutate(
    bar_labels = case_when(
      names == "Vermillion" ~ "ab",
      names == "Valery" ~ "e",
      names == "Rio Colorado" ~ "a",
      names == "Russian Banana" ~ "d",
      names == "Purple Majesty" ~ "cd",
      names == "POR12PG28-3" ~ "ab",
      names ==  "Masquerade" ~ "ab",
      names == "CO99076-6R" ~ "e",
      names == "CO05068-1RU" ~ "c",
      names == "Canela Russet" ~ "ab",
      names == "Atlantic" ~ "b",
      names == "AC99330-1P/Y" ~ "ab",
      TRUE ~ as.character(NA)
    ))

If I want to add another variable, say specie, here under group_by, how i should do that, if the below code will be okay:

df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names,specie) %>% 
  mutate(
    bar_labels = case_when(
      names == "Vermillion" ~ "ab",
      names == "Valery" ~ "e",
      names == "Rio Colorado" ~ "a",
      names == "Russian Banana" ~ "d",
      names == "Purple Majesty" ~ "cd",
      names == "POR12PG28-3" ~ "ab",
      names ==  "Masquerade" ~ "ab",
      names == "CO99076-6R" ~ "e",
      names == "CO05068-1RU" ~ "c",
      names == "Canela Russet" ~ "ab",
      names == "Atlantic" ~ "b",
      names == "AC99330-1P/Y" ~ "ab",
      TRUE ~ as.character(NA)
    ))

By the way specie means:

specie = c(rep("Appearance", 12), rep("Aroma" , 12), rep("Flavor" , 12),
             rep("Overall" , 12), rep("Aftertaste", 12), rep("Texture", 12))

HanOostdijk · December 17, 2021, 8:48am

I am not sure why you relate the mutate with the group_by. I think they work more or less independent.
See this:

``` r
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- data.frame(
  names = c("Vermillion","Rio Grande","Rio Colorado"),
  specie = c("Appearance","Aroma","Flavor" )
)


df2 <- df1 %>% 
  arrange(desc(names)) %>% 
  group_by(names,specie) %>% 
  mutate(
    bar_labels = case_when(
      names == "Vermillion" ~ "ab",
      names == "Rio Colorado" ~ "a",
      TRUE ~ as.character(NA)
    )) %>%
  print()
#> # A tibble: 3 x 3
#> # Groups:   names, specie [3]
#>   names        specie     bar_labels
#>   <chr>        <chr>      <chr>     
#> 1 Vermillion   Appearance ab        
#> 2 Rio Grande   Aroma      <NA>      
#> 3 Rio Colorado Flavor     a

df3 <- df1 %>% 
  mutate(
    bar_labels = case_when(
      names == "Vermillion" ~ "ab",
      names == "Rio Colorado" ~ "a",
      TRUE ~ as.character(NA)
    )) %>%
  print()
#>          names     specie bar_labels
#> 1   Vermillion Appearance         ab
#> 2   Rio Grande      Aroma       <NA>
#> 3 Rio Colorado     Flavor          a
Created on 2021-12-17 by the reprex package (v2.0.1)

^{Created on 2021-12-17 by the reprex package (v2.0.1)}

sharmachetan · December 30, 2021, 5:01am

Actually I have a facet plot, wherein I want different text concerning names (Appearance, Aroma). But the below mentioned code was assigning same text onto each bar in each facet, so I was thinking if how can I specify separate text . I am trying to solve this problem, posted at stack overflow: r - How to add separate text onto each bar in facet wrap? - Stack Overflow

nirgrahamuk · December 30, 2021, 4:34pm

Simply extend your case_when logic to have different results based on different specie input (as well as name)rather than based only on name. group_by is not necessary, as you are not (yet) summarising

sharmachetan · January 1, 2022, 12:16am

Yes, this is what I am trying to figure out - How to specify two conditions under case_when, like How? Like this (below) or how?

df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names) %>% 
  mutate(
    bar_labels = case_when(
      (names = "Vermillion" & specie = "Appearance") ~ "e"
      (names = "AC99330-1P/Y" & specie = "Texture") ~ "ab",
      TRUE ~ as.character(NA)
    ))

However, it produces some kind of error, look here:

Error: unexpected ',' in "      (names = "AC99330-1P/Y" & specie = "Texture") ~ "ab","
>       TRUE ~ as.character(NA)
TRUE ~ as.character(NA)
>     ))
Error: unexpected ')' in "    )"

nirgrahamuk · January 1, 2022, 2:06am

To me it seems a comma is missing after "e"

sharmachetan · January 1, 2022, 4:13am

It worked, but somehow later code is not working, here is code below:

ggplot(data = df, mapping = aes(x = names, y = value)) +
  geom_col(position = "dodge") +
  coord_flip() +
  ylim(c(0,9)) +
  scale_y_continuous(breaks=seq(0.0, 9, 3), limits=c(0, 9), labels = c("0", "3", "6", "Like\nExtremely")) +
  labs(y = "", x = "") + theme(legend.title = element_blank(), axis.text.y = element_text(face = "bold", size = 11),
                               axis.text.x = element_text(face = "bold", size = 9)) +
  scale_fill_discrete(breaks = c("Appearance", "Aroma", "Flavor", "Overall", "Aftertaste", "Texture")) +
  facet_wrap(~h.names, labeller = labeller(h.names = names.labs)) +
  geom_text(aes(label = bar_labels, colour = "white", hjust = 1.7))

I think there is some mistake with geom_text:

Scale for 'y' is already present. Adding another scale for 'y', which will replace
the existing scale.
Error in FUN(X[[i]], ...) : object 'bar_labels' not found

nirgrahamuk · January 1, 2022, 9:17am

Geom_text. hjust is not an aes-thetic so review that

valentingar · January 1, 2022, 6:09pm

As you are only matchi"ng values, a more efficient way ould be to define bar_labels via a join. For this create a second data frame containing the columns names (btw.: not a good column name, as it can be confused with the function names()), specie and bar_labels. Each row is a combination of names and specie and the corresponding bar_labels value. Then you can join this to your original df. The combination of specie and names will be matched and the corresponding bar_labels added to your df.

bar_label_map <- 
data.frame(names = c("Vermillion", "AC99330-1P/Y"), #... add further combinations
           specie = c("Appearance", "Texture"),
           bar_labels = c("e", "ab")

df <- df %>% # df must contain columns `names` and `specie`! 
left_join(bar_label_map) #now bar_labels will be added

As for your plot: are you sure that bar_labels is present in df? The error message appears to suggest otherwise.

sharmachetan · January 2, 2022, 4:34am

Any clue, I just noticed an error:

df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names) %>% 
  mutate(
    bar_labels = case_when(
      (names = "Vermillion" & specie = "Appearance") ~ "e",
      (names = "Vermillion" & specie = "Aroma") ~ "d",
      (names = "Vermillion" & specie = "Flavor") ~ "bc",
      ...
      (names = "AC99330-1P/Y" & specie = "Aftertaste") ~ "abc",
      (names = "AC99330-1P/Y" & specie = "Texture") ~ "ab",
TRUE ~ as.character(NA)
    ))

Error: Problem with `mutate()` column `bar_labels`.
i `bar_labels = case_when(...)`.
x target of assignment expands to non-language object
i The error occurred in group 1: names = "AC99330-1P/Y".
Run `rlang::last_error()` to see where the error occurred.

sharmachetan · January 2, 2022, 4:38am

Are you suggesting to remove rep function and elaborate all the combinations;

names = c("Russian Banana", "Vermillion"),
  specie = c(rep("Appearance", 12), rep("Aroma" , 12), rep("Texture", 12)),
  condition = rep(c("Russian Banana", "Vermillion", "Canela Russet") , 6))

nirgrahamuk · January 2, 2022, 7:47am

= is for assignment or param passing in function calls. For equality testing you are to use == I.e. equals sign twice

sharmachetan · January 2, 2022, 8:00am

Thanks a lot. It ran. Now only thing remaining is ggplot, it still shows error:

Scale for 'y' is already present. Adding another scale for 'y', which will replace
the existing scale.
Error in FUN(X[[i]], ...) : object 'bar_label' not found

It ran fine until I include geom_text line. It shows facet wrap with no text on each bars at all. What you previously suggested about hjust, I tried to remove that again ran but still it shows error:

ggplot(data = df, mapping = aes(x = names, y = value)) +
  geom_col(position = "dodge") +
  coord_flip() +
  ylim(c(0,9)) +
  scale_y_continuous(breaks=seq(0.0, 9, 3), limits=c(0, 9),
                     labels = c("0", "3", "6", "Like\nExtremely")) +
  labs(y = "", x = "") +
  theme(legend.title = element_blank(),
        axis.text.y = element_text(face = "bold", size = 11),
        axis.text.x = element_text(face = "bold", size = 9)) +
  scale_fill_discrete(breaks = c("Appearance", "Aroma", "Flavor",
                                 "Overall", "Aftertaste", "Texture")) +
  facet_wrap(~h.names, labeller = labeller(h.names = names.labs)) +
  geom_text(aes(label = bar_label, colour = "white", hjust = 1.7))

valentingar · January 2, 2022, 8:34am

Blockquote
Are you suggesting to remove rep function and elaborate all the combinations;

No, that's what you had used to create your dataset. I provide an alternative to using case_when() by joining two datasets based on common columns using left_join() in your original datasets there are rows with different combinations of names and specie. To add the bar_labels cloumn we can first create a sort of "map" where we define a dataset with a bar_labels value for each combination of the other two columns. Joining the dataset then adds the bar_lables column to the original dataset.

Blockquote
It ran fine until I include geom_text line. It shows facet wrap with no text on each bars at all. What you previously suggested about hjust , I tried to remove that again ran but still it shows error:

In the geom_text()call you write "bar_label" instead of "bar_labels" - probably just a typo?

sharmachetan · January 2, 2022, 8:53am

Thank you so much, yes there was a typo. So silly I am. Also thank you for the feedback. I was able to made the plot but it is somehow writing double, look here:

If something is wrong here:

df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names) %>% 
  mutate(
    bar_labels = case_when(
      (names == "Vermillion" & specie == "Appearance") ~ "e",
      (names == "Vermillion" & specie == "Aroma") ~ "d",
      (names == "Valery" & specie == "Appearance") ~ "cd",
      (names == "Valery" & specie == "Aroma") ~ "d",
      (names == "Valery" & specie == "Texture") ~ "ab",
      (names == "Rio Colorado" & specie == "Appearance") ~ "ab",
      (names == "Rio Colorado" & specie == "Aroma") ~ "ab",

nirgrahamuk · January 2, 2022, 12:20pm

If you arent wrapping or otherwise splitting out your plot by species then for every name / y position there can be multiple bar labels.

To this point a lot of code has been posted, rewritten, partially reposted etc. I think it would be helpful if you posted a complete reprex including example data, your transform and plotting code, if you want additional help with it.

sharmachetan · January 2, 2022, 10:29pm

Please find here complete code along with actual data:

df <- data.frame(
  H1 = c(6.36, 3.03, 6.85, 4.07, 4.69, 6.27, 6.67, 3.11, 5.07, 6.14, 5.93, 6.49),
  H2 = c(5.15, 5.00, 5.71, 5.50, 4.99, 5.81, 6.05, 5.76, 5.28, 5.69, 5.69, 5.06),
  H3 = c(3.85, 5.13, 4.99, 4.91, 5.01, 5.73, 5.77, 5.94, 5.57, 5.35, 6.00, 4.39),
  H4 = c(3.84, 4.80, 5.15, 4.85, 4.99, 5.73, 5.77, 5.45, 5.44, 5.41, 5.81, 4.46),
  H5 = c(4.08, 5.17, 4.77, 5.03, 5.00, 5.49, 5.49, 5.80, 5.51, 5.18, 5.76, 4.60),
  H6 = c(4.35, 5.59, 5.59, 4.83, 5.52, 5.63, 5.85, 5.74, 5.66, 5.19, 5.79, 4.84),
  fontface = c("bold"),
  names = c("Russian Banana", "Vermillion", "Atlantic", "POR12PG28-3",
            "Valery", "Rio Colorado", "CO99076-6R", "Purple Majesty",
            "AC99330-1P/Y", "CO05068-1RU", "Masquerade", "Canela Russet"),
  specie = c(rep("Appearance", 12), rep("Aroma" , 12), rep("Flavor" , 12),
             rep("Overall" , 12), rep("Aftertaste", 12), rep("Texture", 12)),
  condition = rep(c("Russian Banana", "Vermillion", "Atlantic", "POR12PG28-3",
                    "Valery", "Rio Colorado", "CO99076-6R", "Purple Majesty",
                    "AC99330-1P/Y", "CO05068-1RU", "Masquerade", 
                    "Canela Russet") , 6))

df <- df %>%
  pivot_longer(starts_with("H"), names_to = "h.names")
#> Error in df %>% pivot_longer(starts_with("H"), names_to = "h.names"): could not find function "%>%"

#one condition per plot
nameframe <- enframe(unique(df$h.names))
#> Error in enframe(unique(df$h.names)): could not find function "enframe"
specieframe <- enframe(unique(df$specie))
#> Error in enframe(unique(df$specie)): could not find function "enframe"
names.labs <- c("Appearance", "Aroma", "Flavor", "Overall",
                "Aftertaste", "Texture")
names(names.labs) <- c("H1", "H2", "H3", "H4", "H5", "H6")

#add text onto each bar
df <- df %>% 
  arrange(desc(names)) %>% 
  group_by(names) %>% 
  mutate(
    bar_labels = case_when(
      (names == "Vermillion" & specie == "Appearance") ~ "e",
      (names == "Vermillion" & specie == "Aroma") ~ "d",
      (names == "Vermillion" & specie == "Flavor") ~ "bc",
      (names == "Vermillion" & specie == "Overall") ~ "bcde",
      (names == "Vermillion" & specie == "Aftertaste") ~ "abcd",
      (names == "Vermillion" & specie == "Texture") ~ "ab",
      (names == "Valery" & specie == "Appearance") ~ "cd",
      (names == "Valery" & specie == "Aroma") ~ "d",
      (names == "Valery" & specie == "Flavor") ~ "abd",
      (names == "Valery" & specie == "Overall") ~ "cde",
      (names == "Valery" & specie == "Aftertaste") ~ "bcd",
      (names == "Valery" & specie == "Texture") ~ "ab",
      (names == "Rio Colorado" & specie == "Appearance") ~ "ab",
      (names == "Rio Colorado" & specie == "Aroma") ~ "ab",
      (names == "Rio Colorado" & specie == "Flavor") ~ "a",
      (names == "Rio Colorado" & specie == "Overall") ~ "abcd",
      (names == "Rio Colorado" & specie == "Aftertaste") ~ "abc",
      (names == "Rio Colorado" & specie == "Texture") ~ "ab",
      (names == "Russian Banana" & specie == "Appearance") ~ "ab",
      (names == "Russian Banana" & specie == "Aroma") ~ "bcd",
      (names == "Russian Banana" & specie == "Flavor") ~ "d",
      (names == "Russian Banana" & specie == "Overall") ~ "f",
      (names == "Russian Banana" & specie == "Aftertaste") ~ "e",
      (names == "Russian Banana" & specie == "Texture") ~ "c",
      (names == "Purple Majesty" & specie == "Appearance") ~ "e",
      (names == "Purple Majesty" & specie == "Aroma") ~ "abc",
      (names == "Purple Majesty" & specie == "Flavor") ~ "ab",
      (names == "Purple Majesty" & specie == "Overall") ~ "ab",
      (names == "Purple Majesty" & specie == "Aftertaste") ~ "a",
      (names == "Purple Majesty" & specie == "Texture") ~ "a",
      (names == "POR12PG28-3" & specie == "Appearance") ~ "d",
      (names == "POR12PG28-3" & specie == "Aroma") ~ "abcd",
      (names == "POR12PG28-3" & specie == "Flavor") ~ "bc",
      (names == "POR12PG28-3" & specie == "Overall") ~ "de",
      (names == "POR12PG28-3" & specie == "Aftertaste") ~ "abcd",
      (names == "POR12PG28-3" & specie == "Texture") ~ "bc",
      (names == "Masquerade" & specie == "Appearance") ~ "b",
      (names == "Masquerade" & specie == "Aroma") ~ "abcd",
      (names == "Masquerade" & specie == "Flavor") ~ "a",
      (names == "Masquerade" & specie == "Overall") ~ "a",
      (names == "Masquerade" & specie == "Aftertaste") ~ "ab",
      (names == "Masquerade" & specie == "Texture") ~ "a",
      (names == "CO99076-6R" & specie == "Appearance") ~ "ab",
      (names == "CO99076-6R" & specie == "Aroma") ~ "a",
      (names == "CO99076-6R" & specie == "Flavor") ~ "a",
      (names == "CO99076-6R" & specie == "Overall") ~ "abc",
      (names == "CO99076-6R" & specie == "Aftertaste") ~ "abc",
      (names == "CO99076-6R" & specie == "Texture") ~ "a",
      (names == "CO05068-1RU" & specie == "Appearance") ~ "ab",
      (names == "CO05068-1RU" & specie == "Aroma") ~ "abcd",
      (names == "CO05068-1RU" & specie == "Flavor") ~ "ab",
      (names == "CO05068-1RU" & specie == "Overall") ~ "abcd",
      (names == "CO05068-1RU" & specie == "Aftertaste") ~ "abcd",
      (names == "CO05068-1RU" & specie == "Texture") ~ "abc",
      (names == "Canela Russet" & specie == "Appearance") ~ "ab",
      (names == "Canela Russet" & specie == "Aroma") ~ "cd",
      (names == "Canela Russet" & specie == "Flavor") ~ "cd",
      (names == "Canela Russet" & specie == "Overall") ~ "ef",
      (names == "Canela Russet" & specie == "Aftertaste") ~ "de",
      (names == "Canela Russet" & specie == "Texture") ~ "bc",
      (names == "Atlantic" & specie == "Appearance") ~ "a",
      (names == "Atlantic" & specie == "Aroma") ~ "abc",
      (names == "Atlantic" & specie == "Flavor") ~ "abc",
      (names == "Atlantic" & specie == "Overall") ~ "cde",
      (names == "Atlantic" & specie == "Aftertaste") ~ "cde",
      (names == "Atlantic" & specie == "Texture") ~ "ab",
      (names == "AC99330-1P/Y" & specie == "Appearance") ~ "c",
      (names == "AC99330-1P/Y" & specie == "Aroma") ~ "bcd",
      (names == "AC99330-1P/Y" & specie == "Flavor") ~ "ab",
      (names == "AC99330-1P/Y" & specie == "Overall") ~ "abcd",
      (names == "AC99330-1P/Y" & specie == "Aftertaste") ~ "abc",
      (names == "AC99330-1P/Y" & specie == "Texture") ~ "ab",
      TRUE ~ as.character(NA)
    ))
#> Error in df %>% arrange(desc(names)) %>% group_by(names) %>% mutate(bar_labels = case_when((names == : could not find function "%>%"
#plot
ggplot(data = df, mapping = aes(x = names, y = value)) +
  geom_col(position = "dodge") +
  coord_flip() +
  ylim(c(0,9)) +
  scale_y_continuous(breaks=seq(0.0, 9, 3), limits=c(0, 9),
                     labels = c("0", "3", "6", "Like\nExtremely")) +
  labs(y = "", x = "") +
  theme(legend.title = element_blank(),
        axis.text.y = element_text(face = "bold", size = 11),
        axis.text.x = element_text(face = "bold", size = 9)) +
  scale_fill_discrete(breaks = c("Appearance", "Aroma", "Flavor",
                                 "Overall", "Aftertaste", "Texture")) +
  facet_wrap(~h.names, labeller = labeller(h.names = names.labs)) +
  geom_text(aes(label = bar_labels), colour = "white", hjust = 1.7)
#> Error in ggplot(data = df, mapping = aes(x = names, y = value)): could not find function "ggplot"

^{Created on 2022-01-03 by the reprex package (v2.0.1)}

nirgrahamuk · January 4, 2022, 8:56am

the technical problem you have whereby scale y is referenced, is caused by first using ylim and then using scale_y_continuous. simply drop ylim in favour of the latter.
Your remaining issue is conceptual, I can not solve it as I lack context for your task and data.
you simply have many values and variable labels that you are overplotting at a given name - x axis position.
I don't know what you should be doing instead as to me thats a 'business' question, rather than a technical one.

sharmachetan · January 5, 2022, 1:04am

Thank you so much for the help. My main issue is to show statistical differences on each bar. So I separately ran statistical test and thru alphabets, such as "abc" or "a" or so on, I wish to show the same. So each bar should have one specific alphabet/word, hence i specified each word for each condition thru:

(names == "Masquerade" & specie == "Texture") ~ "a",
(names == "CO99076-6R" & specie == "Appearance") ~ "ab",

But somehow each bar is now showing all the alphabets, I think. Also, i removed ylim but problem is still there.

system · January 26, 2022, 1:04am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.