Why are the means of my boxplots always the same in ggplot?

Howdy!

I have been playing around in R Studio with some public datasets that I downloaded off the internet. For some reason, sometimes when I try and create a boxplot with two categorical variables, the graph always displays the mean bars to be equal despite the fact that the means are different (this is displayed in the picture that I attached). Is someone able to tell me why R keeps doing this?

FYI, PPGENDER 1 & 2, as well as Industry 5 & 16, are both categorical factors. SH is my continuous outcome (sorry for the poor variable names).

.

ggplot(data = Data) + geom_boxplot(aes(x = PPGENDER, y = SH, fill = INDUSTRY_col)) + coord_flip()

Hard to give a proper answer without this. See the FAQ: How to do a minimal reproducible example reprex for beginners.

1 Like

When I do that it just comes up blank.

What @technocrat is trying to tell you is that it is hard to help you if you don't provide a proper REPRoducible EXample (reprex) illustrating your issue, which includes sample data in a copy-paste friendly format.

Please click on the link and read the guide to learn how to provide a reproducible example.

2 Likes

As the other commenters have said, it will be difficult to know for certain what's going wrong until we see the actual data frame you used to make the plot. Two things for now:

First, are the values of SH all integers? Note that every one of the boxplot statistics is an integer value: 0, 2, 4, 5, 9 and 11. This seems unlikely unless SH is all integer values.

Second, the midline of the boxplot is the median, rather than the mean. If SH is all integers, the median will likely be an integer. Is it possible that the median of SH is always 2 for the particular combinations of PPGENDER and INDUSTRY_col in your graph?

1 Like