Histogram missing a single bar of data

Can someone help me to figure out why the histogram is missing a single day's worth of data across all years from 2016-2023 when the data is present in the referenced tibble.

FEB_Histogram <- FEB_2016_2023_Data_2 %>% filter(Year != 2024) %>% 
  ggplot()+
  geom_histogram(aes(x = Day))+
  theme_clean()+
  labs(x = "Days of the Month",
       y = "Crimes Instances",
       title = "February Crime Density (2016 - 2023)")+
  geom_hline(yintercept = 19.03448		, col = "red", linewidth = 2)+
  annotate("text", x=3.2, y=20.5, label="2016-2023 Mean", size = 3.2)+
  scale_x_continuous(breaks = 1:29)
FEB_Histogram

Hi @steelsabre. Are you able to share the data to assist in troubleshooting? If not, does the missing day show up in the following?

FEB_2016_2023_Data_2 %>% filter(Year != 2024) %>% count(Day)

@scottyd22 Below is a screenshot of the data. As you can see the != 2024 Doesn't effect it as there are data points for other years. Additionally, it will not affect it until I start putting in 2024 data for FEB since we haven't reached that date in time yet. I have been using the same code for previous months and have not encountered this problem. If the screen shot does not help I can try to pair down the data and upload it as the file is massive.

Thanks for the the screenshot. Nothing jumps out as being a problem. Can you please copy and paste the output of the code below? This will produce a sample set of two rows for every Day in the dataset.

FEB_2016_2023_Data_2 %>% 
  filter(Year != 2024) %>% 
  group_by(Day) %>%
  filter(row_number() <= 2) %>%
  ungroup() %>%
  dput()

structure(list(Incident = c(1602010041, 1602010169, 1602020136,
1602020207, 1602030105, 1602040044, 1602040092, 1602050123, 1602050175,
1602060071, 1602060150, 1602070007, 1602070070, 1602080098, 1602080161,
1602090025, 1602090074, 1602100023, 1602100084, 1602110194, 1602120099,
1602130094, 1602140029, 1602140102, 1602150001, 1602150057, 1602160079,
1602160136, 1602170163, 1602170194, 1602180149, 1602190122, 1602190150,
1602200066, 1602200074, 1602210054, 1602210126, 1602220042, 1602220097,
1602240107, 1602240153, 1602250071, 1602250088, 1602260158, 1602260160,
1602270033, 1602290047, 1602290072, 1702110035, 1702120098, 1702130094,
1702180049, 1702230011, 1702230066, 1702270100, 1702280069, 1802030018,
1802280132), Year = c(2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2018, 2018), Month = c("-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-", "-02-",
"-02-"), Day = c(1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
9, 9, 10, 10, 11, 12, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18,
19, 19, 20, 20, 21, 21, 22, 22, 24, 24, 25, 25, 26, 26, 27, 29,
29, 11, 12, 13, 18, 23, 23, 27, 28, 3, 28), Time = c(751, 1430,
1321, 1914, 1252, 648, 1123, 1528, 2034, 1007, 1700, 36, 1016,
1120, 1640, 707, 1302, 703, 1138, 1745, 1335, 1347, 750, 1536,
0, 1138, 1102, 1609, 1702, 1953, 1621, 1515, 1821, 1048, 1204,
850, 1633, 855, 1459, 1636, 2043, 1056, 1229, 1905, 1922, 823,
903, 1105, 758, 1928, 1034, 833, 404, 1016, 1342, 1033, 757,
1358), Location = c("1100 BLOCK NORSAM RD, GLAD", "200 BLOCK E LANCASTER AVE, ARDM",
"300 BLOCK LOCUST AVE, ARDM", "100 BLOCK EDGEHILL RD, BCYN",
"1600 BLOCK OAKWOOD DR, PNVY", "1400 BLOCK CITY AVE, OVHL", "300 BLOCK N LATCHES LN, MERN",
"1400 BLOCK W MONTGOMERY AVE, ROMT", "100 BLOCK W CITY AVE, BCYN",
"500 BLOCK W LANCASTER AVE, HVRD", "UNIT BLOCK E LANCASTER AVE, ARDM",
"1100 BLOCK W LANCASTER AVE, BMWR", "500 BLOCK ROCK GLEN RD, WYNN",
"100 BLOCK ORCHARD RD, HVRD", "700 BLOCK W LANCASTER AVE, BMWR",
"200 BLOCK MARLBORO RD, ARDM", "100 BLOCK UNION AVE, BCYN", "100 BLOCK DAVID RD, BCYN",
"1300 BLOCK VALLEY RD, VILL", "UNIT BLOCK E LANCASTER AVE, ARDM",
"100 BLOCK N LATCHS LN, BCYN", "100 BLOCK E CITY AVE, BCYN",
"100 BLOCK WALNUT AVE, ARDM", "100 BLOCK E CITY AVE, BCYN", "500 BLOCK E CITY AVE, BCYN",
"100 BLOCK LANCASTER AVE, OVBK", "300 BLOCK VALLEY RD, MERN",
"100 BLOCK W LANCASTER AVE, ARDM", "500 BLOCK E CITY AVE, BCYN",
"100 BLOCK W CITY AVE, BCYN", "300 BLOCK VALLEY RD, MERN", "UNIT BLOCK E LANCASTER AVE, ARDM",
"100 BLOCK W CITY AVE, BCYN", "100 BLOCK E CITY AVE, BALA", "800 BLOCK W LANCASTER AVE, BMWR",
"400 BLOCK HOLLY LN, WYNN", "UNIT BLOCK ST JAMES PL, ARDM", "100 BLOCK LANCASTER AVE, OVBK",
"100 BLOCK E CITY AVE, BCYN", "UNIT BLOCK E LANCASTER AVE, ARDM",
"700 BLOCK MT PLEASANT RD, BMWR", "100 BLOCK FAIRVIEW AVE, BHIL",
"100 BLOCK W CITY AVE, BCYN", "100 BLOCK E WYNNEWOOD RD, OVBK",
"UNIT BLOCK E WYNNEWOOD RD, WYNN", "100 BLOCK OLD BELMONT AVE, BHIL",
"200 BLOCK SIMPSON RD, ARDM", "300 BLOCK PENBREE TER, BCYN",
"UNIT BLOCK E LANCASTER AVE, ARDM", "200 BLOCK ROCK GLEN RD, PNWN",
"1300 BLOCK SUSSEX RD, WYNN", "1200 BLOCK CHERMAR LN, PNVY",
"UNIT BLOCK CONSHOHOCKEN STATE RD, BCYN", "200 BLOCK CURWEN RD, ROMT",
"400 BLOCK GREAT SPRINGS RD, BMWR", "100 BLOCK W CITY AVE, BCYN",
"100 BLOCK EDGEHILL RD, BCYN", "100 BLOCK HARVEST CIR, PNVY"),
Description = c("VEHICLE THEFT-STOLEN LOCAL-RESIDENTIAL",
"THEFT-$50 TO $200-FROM BUILDINGS", "THEFT-OVER $200-FROM BUILDINGS",
"BURGLARY-NO FORCE-RESIDENCE-DAY", "THEFT-UNDER $50-FROM BUILDINGS",
"THEFT-OVER $200-RETAIL THEFT", "THEFT-$50 TO $200-FROM BUILDINGS",
"THEFT-UNDER $50-FROM BUILDINGS", "THEFT-ATTEMPTED-RETAIL THEFT",
"THEFT-$50 TO $200-FROM BUILDINGS", "THEFT-UNDER $50-POCKET PICKING",
"THEFT-OVER $200-OTHER", "THEFT-$50 TO $200-FROM BUILDINGS",
"BURGLARY-FORCE-RESIDENCE-UNKNOWN", "THEFT-UNDER $50-RETAIL THEFT",
"THEFT-ATTEMPTED-FROM MOTOR VEHICLE", "THEFT-ATTEMPTED-FROM BUILDINGS",
"BURGLARY-NO FORCE-RESIDENCE-DAY", "BURGLARY-FORCE-RESIDENCE-UNKNOWN",
"THEFT-UNDER $50-MV PARTS & ACCESSORIES", "THEFT-OVER $200-BICYCLES",
"THEFT-$50 TO $200-RETAIL THEFT", "THEFT-OVER $200-FROM MOTOR VEHICLE",
"THEFT-OVER $200-RETAIL THEFT", "THEFT-$50 TO $200-OTHER",
"THEFT-$50 TO $200-OTHER", "THEFT-$50 TO $200-MV PARTS & ACCESSORIES",
"THEFT-UNDER $50-RETAIL THEFT", "THEFT-$50 TO $200-FROM BUILDINGS",
"THEFT-$50 TO $200-RETAIL THEFT", "THEFT-UNDER $50-FROM MOTOR VEHICLE",
"THEFT-OVER $200-FROM BUILDINGS", "THEFT-$50 TO $200-RETAIL THEFT",
"THEFT-OVER $200-RETAIL THEFT", "THEFT-OVER $200-FROM BUILDINGS",
"THEFT-$50 TO $200-OTHER", "THEFT-UNDER $50-FROM BUILDINGS",
"THEFT-ATTEMPTED-FROM BUILDINGS", "THEFT-OVER $200-RETAIL THEFT",
"THEFT-OVER $200-PURSE SNATCHING", "ROBBERY-STRONG ARM-RESIDENCE",
"THEFT-UNDER $50-FROM MOTOR VEHICLE", "THEFT-$50 TO $200-RETAIL THEFT",
"THEFT-$50 TO $200-OTHER", "THEFT-UNDER $50-POCKET PICKING",
"BURGLARY-FORCE-NON RES-NIGHT", "THEFT-$50 TO $200-FROM BUILDINGS",
"BURGLARY-FORCE-RESIDENCE-DAY", "THEFT-OVER $200-FROM BUILDINGS",
"THEFT-UNDER $50-FROM MOTOR VEHICLE", "THEFT-ATTEMPTED-FROM MOTOR VEHICLE",
"THEFT-$50 TO $200-FROM MOTOR VEHICLE", "THEFT-OVER $200-POCKET PICKING",
"BURGLARY-NO FORCE-RESIDENCE-UNKNOWN", "THEFT-UNDER $50-FROM BUILDINGS",
"THEFT-UNDER $50-RETAIL THEFT", "VEHICLE THEFT-STOLEN LOCAL-RESIDENTIAL",
"BURGLARY-FORCE-RESIDENCE-UNKNOWN"), Location2 = c("1100 BLOCK NORSAM RD",
"200 BLOCK E LANCASTER AVE", "300 BLOCK LOCUST AVE", "100 BLOCK EDGEHILL RD",
"1600 BLOCK OAKWOOD DR", "1400 BLOCK CITY AVE", "300 BLOCK N LATCHES LN",
"1400 BLOCK W MONTGOMERY AVE", "100 BLOCK W CITY AVE", "500 BLOCK W LANCASTER AVE",
"UNIT BLOCK E LANCASTER AVE", "1100 BLOCK W LANCASTER AVE",
"500 BLOCK ROCK GLEN RD", "100 BLOCK ORCHARD RD", "700 BLOCK W LANCASTER AVE",
"200 BLOCK MARLBORO RD", "100 BLOCK UNION AVE", "100 BLOCK DAVID RD",
"1300 BLOCK VALLEY RD", "UNIT BLOCK E LANCASTER AVE", "100 BLOCK N LATCHS LN",
"100 BLOCK E CITY AVE", "100 BLOCK WALNUT AVE", "100 BLOCK E CITY AVE",
"500 BLOCK E CITY AVE", "100 BLOCK LANCASTER AVE", "300 BLOCK VALLEY RD",
"100 BLOCK W LANCASTER AVE", "500 BLOCK E CITY AVE", "100 BLOCK W CITY AVE",
"300 BLOCK VALLEY RD", "UNIT BLOCK E LANCASTER AVE", "100 BLOCK W CITY AVE",
"100 BLOCK E CITY AVE", "800 BLOCK W LANCASTER AVE", "400 BLOCK HOLLY LN",
"UNIT BLOCK ST JAMES PL", "100 BLOCK LANCASTER AVE", "100 BLOCK E CITY AVE",
"UNIT BLOCK E LANCASTER AVE", "700 BLOCK MT PLEASANT RD",
"100 BLOCK FAIRVIEW AVE", "100 BLOCK W CITY AVE", "100 BLOCK E WYNNEWOOD RD",
"UNIT BLOCK E WYNNEWOOD RD", "100 BLOCK OLD BELMONT AVE",
"200 BLOCK SIMPSON RD", "300 BLOCK PENBREE TER", "UNIT BLOCK E LANCASTER AVE",
"200 BLOCK ROCK GLEN RD", "1300 BLOCK SUSSEX RD", "1200 BLOCK CHERMAR LN",
"UNIT BLOCK CONSHOHOCKEN STATE RD", "200 BLOCK CURWEN RD",
"400 BLOCK GREAT SPRINGS RD", "100 BLOCK W CITY AVE", "100 BLOCK EDGEHILL RD",
"100 BLOCK HARVEST CIR"), Location3 = c("GLAD", "ARDM", "ARDM",
"BCYN", "PNVY", "OVHL", "MERN", "ROMT", "BCYN", "HVRD", "ARDM",
"BMWR", "WYNN", "HVRD", "BMWR", "ARDM", "BCYN", "BCYN", "VILL",
"ARDM", "BCYN", "BCYN", "ARDM", "BCYN", "BCYN", "OVBK", "MERN",
"ARDM", "BCYN", "BCYN", "MERN", "ARDM", "BCYN", "BALA", "BMWR",
"WYNN", "ARDM", "OVBK", "BCYN", "ARDM", "BMWR", "BHIL", "BCYN",
"OVBK", "WYNN", "BHIL", "ARDM", "BCYN", "ARDM", "PNWN", "WYNN",
"PNVY", "BCYN", "ROMT", "BMWR", "BCYN", "BCYN", "PNVY"),
Description2 = c("VEHICLE THEFT", "THEFT", "THEFT", "BURGLARY",
"THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT",
"THEFT", "THEFT", "BURGLARY", "THEFT", "THEFT", "THEFT",
"BURGLARY", "BURGLARY", "THEFT", "THEFT", "THEFT", "THEFT",
"THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT",
"THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT", "THEFT",
"THEFT", "THEFT", "THEFT", "ROBBERY", "THEFT", "THEFT", "THEFT",
"THEFT", "BURGLARY", "THEFT", "BURGLARY", "THEFT", "THEFT",
"THEFT", "THEFT", "THEFT", "BURGLARY", "THEFT", "THEFT",
"VEHICLE THEFT", "BURGLARY"), Time_Conversion = structure(c(-2209046940,
-2209023000, -2209027140, -2209005960, -2209028880, -2209050720,
-2209034220, -2209019520, -2209001160, -2209038780, -2209014000,
-2209073040, -2209038240, -2209034400, -2209015200, -2209049580,
-2209028280, -2209049820, -2209033320, -2209011300, -2209026300,
-2209025580, -2209047000, -2209019040, -2209075200, -2209033320,
-2209035480, -2209017060, -2209013880, -2209003620, -2209016340,
-2209020300, -2209009140, -2209036320, -2209031760, -2209043400,
-2209015620, -2209043100, -2209021260, -2209015440, -2209000620,
-2209035840, -2209030260, -2209006500, -2209005480, -2209045020,
-2209042620, -2209035300, -2209046520, -2209005120, -2209037160,
-2209044420, -2209060560, -2209038240, -2209025880, -2209037220,
-2209046580, -2209024920), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Time_Round = structure(c(-2209046400, -2209021200,
-2209028400, -2209006800, -2209028400, -2209050000, -2209035600,
-2209021200, -2208999600, -2209039200, -2209014000, -2209071600,
-2209039200, -2209035600, -2209014000, -2209050000, -2209028400,
-2209050000, -2209032000, -2209010400, -2209024800, -2209024800,
-2209046400, -2209017600, -2209075200, -2209032000, -2209035600,
-2209017600, -2209014000, -2209003200, -2209017600, -2209021200,
-2209010400, -2209035600, -2209032000, -2209042800, -2209014000,
-2209042800, -2209021200, -2209014000, -2208999600, -2209035600,
-2209032000, -2209006800, -2209006800, -2209046400, -2209042800,
-2209035600, -2209046400, -2209006800, -2209035600, -2209042800,
-2209060800, -2209039200, -2209024800, -2209035600, -2209046400,
-2209024800), tzone = "UTC", class = c("POSIXct", "POSIXt"
))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-58L))

Thank you for the sample data. It looks like adding a binwidth argument should work.

geom_histogram(aes(x = Day), binwidth = 1)
1 Like

Thank you! This worked. I tried binwidth before but I mistakenly put it inside the parentheses (x = day, bindwidth = 1). Do you mind explaining how you knew it was a binwidth problem?

Sure! I first recreated your plot and specified fill values for days 14 and 15 to see where they were plotting (see below). Once I saw they were off center, I looked up how to "center" the bars. In the documentation for geom_histogram(), under the center argument, it said to set binwidth = 1 when centering on an integer.

3 Likes

Thank you again, Ill add this to my kitbag

That solution is great. Just as a side-note, the point of histograms is its automated binning; so if you're not going to bin several days together, you're probably better off with geom_bar() or geom_col().

For example:

FEB_2016_2023_Data_2 %>%
  filter(Year != 2024) %>%
  ggplot()+
  geom_bar(aes(x = Day))+
  theme_clean()+
  labs(x = "Days of the Month",
       y = "Crimes Instances",
       title = "February Crime Density (2016 - 2023)")+
  geom_hline(yintercept = 19.03448		, col = "red", linewidth = 2)+
  annotate("text", x=3.2, y=20.5, label="2016-2023 Mean", size = 3.2)+
  scale_x_continuous(breaks = 1:29)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.