Histogram in R with "0" in its own bin

I have data that includes a lot of zeros and so would like to make a histogram that has "0" as its own bin. My attempts to do this return errors in R such as:

head(env_subset$den)
[1] 0.010339885 0.004557282 0.003436797 0.000000000 0.000000000 0.000000000

env_subset$bins <- cut(env_subset$den, breaks=c(0, 0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.4),
labels=c("0-0.001","0.001-0.005","0.005-0.01","0.01-0.05","0.05-0.1", "0.1+"))

Error in cut.default(env_subset$den, breaks = c(0, 0, 0.001, 0.005, 0.01, :
'breaks' are not unique

If I try again without giving "0" its own bin I get a plot where all zeros are simply labeled as NA

env_subset$bins <- cut(env_subset$den, breaks=c(0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.4),
labels=c("0-0.001","0.001-0.005","0.005-0.01","0.01-0.05","0.05-0.1", "0.1+"))

ggplot(env_subset, aes(bins)) +
geom_bar() +
labs(title="Salinity (ppt)",x="Salinity range", y = "Frequency") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text = element_text(size = 10)) +
theme(axis.title = element_text(size = 13, face = "bold"))

Rplot

Hi,

Welcome to the RStudio community!

It might be easier to create bins yourself by creating custom labels, then plot the histogram of these labels like so:

library(ggplot2)
library(dplyr)

#Generate some data
set.seed(2) #Only needed for reproducibility 
myData = data.frame(
  x = runif(250)
)

#Introduce many 0
myData$x[sample(1:nrow(myData), 200)] = 0

#Generate bins
myData = myData %>% mutate(
  bin = case_when(
    x == 0 ~ "0",
    x <= 0.3 ~ "0 - 0.3",
    x <= 0.7 ~ "0.3 - 0.7",
    TRUE ~ "0.7+" 
  )
)

#Plot histogram
ggplot(myData, aes(x = bin)) + geom_histogram(stat = "count")
#> Warning: Ignoring unknown parameters: binwidth, bins, pad

Created on 2022-05-12 by the reprex package (v2.0.1)

Hope this helps,
PJ

1 Like

Thank you! I was able to get it, but had a really hard time setting the bins. I kept having to try different combinations of this part of the code:

c(0, seq(0.001, 0.45, .05))

env_subset$bins <- cut(env_subset$den, c(0, seq(0.001, 0.45, .05)), right = FALSE,
labels=c("0", "0.001-0.051","0.051-0.101", "0.101-0.151", "0.151-0.201", "0.201-0.251","0.251-0.301","0.301-0.351","0.351+"))

This helps a lot!

You were provided a good method for direct 'hands on' bin configuration.
Here is a slightly more dynamic approach where you set 0 apart, and let cut() split the rest into bins based only on a number of bins param. in my example I chose 5

library(ggplot2)
library(dplyr)

#Generate some data
set.seed(2) #Only needed for reproducibility 
myData = data.frame(
  x = runif(250)
)

#Introduce many 0
myData$x[sample(1:nrow(myData), 200)] = 0

myData0 <- filter(myData,x==0) %>% mutate(bin=factor(x))
myDataNot0 <- filter(myData,x!=0) %>%
  mutate(bin = cut(x,5L))

#Generate bins
mydata_binned <- bind_rows(myData0,
                           myDataNot0)

#Plot histogram
ggplot(mydata_binned, aes(x = bin)) + geom_bar()
1 Like