Unknown graph issue! Histogram ggplot2 axis??

Hello all,

There seams to be something wrong with my histogram.
I would really appreciate some help with this puzzle!!

I am trying to plot the hours crime was reported in the crime data set by year. However, the data for the 'hours of the day crime was reported' (x axis) does not start at zero? It is slightly offset. I don't know why and am trying to reproduce a graph in which the data stats at zero (see bellow).

Also, in the original graph the max crimes reported per hour is >20000, while on my graph reports only go up to 1500. Again I am lost as to why?

all advice would be greatly appreciated!!

Here is the code: warning it may take a few minutes to get the data set

heres my graph if you can figure it out from this:)

Thank you!!!



library(tidyverse)
library(lubridate)
library(dslabs)
library(crimedata)

crime <- crimedata::get_crime_data(years = 2008:2018)

write.csv(crime, file = './data/crime.csv')

crime <- crime20 # renaming for clarity - it is not the whole data set

# Exploring hourly reporting

crime20_hour <- crime20 %>% 
    mutate(hour = hour(date_single))

## shift axis?????
ggplot(crime20_hour, aes(x = hour, stat = "identity"))+
  geom_histogram(binwidth=1, fill="grey")

All good!
I found the answer here:)
https://analyticslog.com/blog/2020/5/25/geom-histogram-make-bin-start-at-zero-ggplotstart-bin-at-zero-ggplot

useing boundary = 0 to set the centre of the bin!

ggplot(crime20_hour, aes(x = hour))+
  geom_histogram(binwidth=1, fill="grey", boundary=0)+
  theme_solarized()+
      labs(
    y = "Count",
    x = "Hour of day")+
  theme_clean()+
  xlim(0,25)

The bins are centred. The data is at 0, 1, 2,..23, and so the bars are plotted such that they are centred over those values. If you really want them to start at 0, then do your own summary of the data and just add 0.5 to x when plotting.

Inspection of the summary data shows that the maximum is ~17000 at x = 0, not >20000.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dslabs)
library(crimedata)
#> Warning: package 'crimedata' was built under R version 4.0.5

crime <- crimedata::get_crime_data(years = 2008:2018)

write.csv(crime, file = 'crime.csv')

crime20 <- crime # renaming for clarity - it is not the whole data set

# Exploring hourly reporting

crime20_hour <- crime20 %>% 
  mutate(hour = hour(date_single))



summary <- 
  crime20_hour %>% 
  group_by(hour) %>% 
  summarise(count = n()) %>% 
  ungroup() 
#> `summarise()` ungrouping output (override with `.groups` argument)

summary %>% 
ggplot(aes(x = hour+0.5, y = count))+
  geom_col(fill="grey", width = 1)

summary
#> # A tibble: 24 x 2
#>     hour count
#>    <int> <int>
#>  1     0 17623
#>  2     1  7025
#>  3     2  6073
#>  4     3  5062
#>  5     4  4159
#>  6     5  3239
#>  7     6  3395
#>  8     7  4292
#>  9     8  6202
#> 10     9  6578
#> # ... with 14 more rows

Created on 2021-04-09 by the reprex package (v1.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.