average per factor with ggplot

Hello,

I'm starting out in R and would need a little help! I am trying to make a bar graph with the average of factors.

I have 2 columns:

Col1 with qualitative values to factor : red, black, blue
Col2 with quantitative values so I want to get the average

Here is what I could initiate, but I get the global average...

library(ggplot2)
BDD$Col1 <- as.factor(BDD$Col1)
moy <- mean(BDD$Col2, na.rm = TRUE)
color_table <- data.frame(color = BDD$Col1, mean = moy)
ggplot(BDD, aes(x = Col1, fill = moy)) +
geom_bar() +
xlab("Color") +
ylab("Average by color")

With the desired result:
(aggregated) Associated mean
Red
Black
Blue
...

Thank you very much for your help!
Sincerely, :smiley:

I think the confusion here is that as.factor() only effects the specific column and converts it to a factor variable. It does not effect the data.frame as a whole. The kind of thing you'd be looking for is a grouped data.frame, such as through the group_by() function.

To create a table with the average of each group, you can use the following:

library(dplyr)
BDD %>%
  group_by(Col1) %>%
  summarise(Col2_mean = mean(Col2)

This won't help in creating a plot however.

In order to do that, you need to apply the fill= aesthetic to the variable you want the colour to depend on, in this case Col1. These aesthetics are mappings from your data to aesthetic concepts, such as the x-axis (x=), the y-axis (y=), the fill of an object (fill=), it's size (size=), its shape (shape=) and many many more. So, in the aes() function in ggplot(), we use fill=Col1, rather than fill=moy. It's also useful, since you're using colour names as your factoring variable, to add the scale_fill_identity() function to the end, this will make the colours of your bars match the colours of your factor (try it without and see how confusing it is!)

ggplot(BDD, aes(x = Col1, fill = Col1)) +
  geom_bar() +
  xlab("Color") +
  ylab("Average by color") +
  scale_fill_identity()

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.