There seams to be something wrong with my histogram.
I would really appreciate some help with this puzzle!!
I am trying to plot the hours crime was reported in the crime data set by year. However, the data for the 'hours of the day crime was reported' (x axis) does not start at zero? It is slightly offset. I don't know why and am trying to reproduce a graph in which the data stats at zero (see bellow).
The bins are centred. The data is at 0, 1, 2,..23, and so the bars are plotted such that they are centred over those values. If you really want them to start at 0, then do your own summary of the data and just add 0.5 to x when plotting.
Inspection of the summary data shows that the maximum is ~17000 at x = 0, not >20000.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dslabs)
library(crimedata)
#> Warning: package 'crimedata' was built under R version 4.0.5
crime <- crimedata::get_crime_data(years = 2008:2018)
write.csv(crime, file = 'crime.csv')
crime20 <- crime # renaming for clarity - it is not the whole data set
# Exploring hourly reporting
crime20_hour <- crime20 %>%
mutate(hour = hour(date_single))
summary <-
crime20_hour %>%
group_by(hour) %>%
summarise(count = n()) %>%
ungroup()
#> `summarise()` ungrouping output (override with `.groups` argument)
summary %>%
ggplot(aes(x = hour+0.5, y = count))+
geom_col(fill="grey", width = 1)