I'm working on hockey analytics, specifically modeling the goals scored as a poisson distribution.
So far, I've been able to create a new column, time_diff
, that finds the number of minutes passed since the last goal scored.
Now, I'd like to add in a row at the end of very game (represented by game_id
) that displays the number of minutes passed in the game from the last goal scored to the end of the game.
For example, in game_id
2007020001
, the last goal scored in the game was at 59.56 minutes. So, I'd like to insert a row right after this game with event_type
column containing the value, game_end
and column time_diff
containing the value 0.44
I've seen some data.table
solutions on SO, but would love to see a tidyverse
solution since I'm not used to data.table
Here is subset of my data:
2007020001L, 2007020001L, 2007020002L, 2007020002L, 2007020002L,
2007020002L, 2007020002L, 2007020003L, 2007020003L, 2007020003L,
2007020003L, 2007020003L, 2007020004L, 2007020004L, 2007020004L
), session = c("R", "R", "R", "R", "R", "R", "R", "R", "R", "R",
"R", "R", "R", "R", "R", "R", "R", "R"), game_seconds = c(515L,
1815L, 2470L, 3189L, 3574L, 649L, 919L, 1373L, 1450L, 1737L,
217L, 781L, 2971L, 3168L, 3665L, 586L, 2072L, 2358L), event_type = c("GOAL",
"GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL",
"GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL", "GOAL",
"GOAL"), time_diff = c(8.58, 21.67, 10.92, 11.98, 6.42, 10.82,
4.5, 7.57, 1.28, 4.78, 3.62, 9.4, 36.5, 3.28, 8.28, 9.77, 24.77,
4.77)), row.names = c(NA, -18L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = c("game_id", "session"), drop = TRUE, indices = list(
0:4, 5:9, 10:14, 15:17), group_sizes = c(5L, 5L, 5L, 3L), biggest_group_size = 5L, labels = structure(list(
game_id = 2007020001:2007020004, session = c("R", "R", "R",
"R")), row.names = c(NA, -4L), class = "data.frame", vars = c("game_id",
"session"), drop = TRUE))