Recently, I've been trying to create boxplots comparing data from two different variables. I have tried basic plots and ggplots but the plots were not same in both of the phases.

My data looks like in the following way-

Months Counts
MAR 3055
DEC 8269
JAN 1012
FEB 2740
MAR 1888
OCT 5116
NOV 5614
DEC 5192
JAN 3172
FEB 5476
MAR 2881
OCT 979
NOV 2090
DEC 1148
JAN 735
FEB 3763
MAR 3027

For the above data set, I wanted to look the spread of the data by boxplots. I used basic R plots and ggplots.

SUR_MONTHS <- read.csv("F:/SUR_MONTHS.csv")
SUR_MONTHS$Months <- factor(SUR_MONTHS$Months , levels=c("OCT", "NOV", "DEC", "JAN", "FEB", "MAR"))
                                                      

boxplot(SUR_MONTHS$Counts ~ SUR_MONTHS$Months , ylab="Counts" , 
        xlab="Months of the winter season")

Again, for the same datasheet I used the ggplots

library(ggplot2)
SUR_MONTHS$Months <- factor(SUR_MONTHS$Months)
my.bp <<-ggplot(data=SUR_MONTHS, aes(y= Counts, x=Months, fill=Months ))
my.bp <- my.bp + geom_boxplot()
my.bp <- my.bp +  ylab("Counts of Species") + xlab("Months")
my.bp

So, my confusion is why the graphs looks different for the same data sheet? Could anyone help me to figure out my mistakes?

I am afraid your question is very badly readable.
Is it possible for you to reply to this message where you copy the code that you use from your R session in a 'preformatted text' section? Use the </> button

```
type or paste code here
```

and copy the code between the two lines with back-ticks. I know that you tried to do this (I see the text 'preformatted text' but the code does not seem to be correct).
I also see only one boxplot. Please show also the other one and indicate (in words) what is wrong with them. You can not expect that they are identical if you use different code.

So in the ideal case I see in your reply:

  • common code for both cases
  • specific code for basic plot case
  • output plot for basic plot case
  • specific code for ggplot2 case
  • output plot for ggplot2 case
  • your explanation why one or both of these plots is/are different from your expectation.
1 Like

And please don't forget some details about the data. Typically for "counts" you would use a column chart with a categorical variable on the x axis. A box plot assumes you have a distribution of "counts" for more that one Oct, Nov, etc either by time, geography or other variable.

1 Like

Thank you for your reply and your suggestions to visualize the codes. I tried to update it according to your suggestion but for a new user there is a restriction to add only one graph. I am trying to copying it in following:

Basic plots codes:

SUR_MONTHS <- read.csv("F:/SUR_MONTHS.csv")
SUR_MONTHS$Months <- factor(SUR_MONTHS$Months , levels=c("OCT", "NOV", "DEC", "JAN", "FEB", "MAR"))
                                                      
boxplot(SUR_MONTHS$Counts ~ SUR_MONTHS$Months , ylab="Counts" , 
        xlab="Months of the winter season")
 

Basic plot's graph:
As there are some restrictions for the new user so, please find the basic R plot that I have added in my main question ask section and the ggplot's graph in the following.

ggplots codes:

library(ggplot2)
SUR_MONTHS$Months <- factor(SUR_MONTHS$Months)
my.bp <<-ggplot(data=SUR_MONTHS, aes(y= Counts, x=Months, fill=Months ))
my.bp <- my.bp + geom_boxplot()
my.bp <- my.bp +  ylab("Counts of Species") + xlab("Months")
my.bp

ggplots graph:

Rplot

So, my question is when I ran the basic boxplot, the observation for the October and November were different and there were not outlier in the month of March. But, why this kind of differences happening in the ggplots with the basic plots?

Thank you. I re-added the details of my data in the above.

I thought it might be because the definition of the whiskers/outliers was different. But it seems to be the same for both functions. From the documentation:

  1. boxplot() - "the range parameter (default 1.5) determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes."

  2. geom_boxplot() and stat_boxplot() - "coef = length of the whiskers as multiple of IQR. Defaults to 1.5."

1 Like

Thank you for you comments but sorry, I am a little bit confused about it still now. Could you please, elaborate on that?
Also, when I will describe/interpret about it then, how can I say that (if I use the ggplot's graph)?

It's hard to guess what is going on without the data or a representative sample.

I have added the sample formats of my data in my above inquiry section. Again I am copying it here and hope it will provide you an idea on that.
Here, these counts are for a single species recorded in different months.

Months Counts
MAR 3055
DEC 8269
JAN 1012
FEB 2740
MAR 1888
OCT 5116
NOV 5614
DEC 5192
JAN 3172
FEB 5476
MAR 2881
OCT 979
NOV 2090
DEC 1148
JAN 735
FEB 3763
MAR 3027
library(tidyverse)

sur_data <- data.frame(months =rep(c("OCT", "NOV", "DEC", "JAN", "FEB", "MAR"), 20))

sur_data <- mutate(sur_data, counts = runif(min = 100, max=8000, n=120))

boxplot(sur_data$counts ~ sur_data$months,ylab="Counts of Species", 
        xlab="Months of the winter season")

ggplot(data=sur_data, aes(x=months, y=counts)) + 
  geom_boxplot() +  
  ylab("Counts of Species") + 
  xlab("Months of the winter season")

The code above produces these two plot objects:
image

now with ggplot2
image

These look the same. If you can modify the data in this example to show how you get different plots you may have discovered a bug in one of the packages.

1 Like

Thank you so much for your efforts and explanations.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.