I have a dataset, df, where I am trying to calculate the percent increase of a particular group over a time period. Here is the dataset:
date size type
1/1/2020 1 a
1/1/2020 1 a
1/1/2020 3 a
1/1/2020 1 b
1/1/2020 2 b
1/1/2020 0 b
2/1/2020 5 a
2/1/2020 6 a
2/1/2020 3 a
2/1/2020 20 b
2/1/2020 21 b
2/1/2020 30 b
Desired output
date increase diff type
1/1/2020 200% 2 a
1/1/2020 -40% -2 a
2/1/2020 -100 -1 b
2/1/2020 50% 10 b
Percent Increase/Change is final-inital/initial * 100
example for a , we start at 1 then end at 3,
which is a 200% increase in the month of January
This is what I am doing:
df %>%
group_by(type, size) %>%
mutate(
increase = (lead(size) - size),
percent_increase = ((lead(size) - size)/size) * 100,
begin = date,
end = lead(date)) %>%
filter(!is.na(percent_increase)) %>%
arrange(location)
However, my dates are not grouping correctly. Since I only have dates that are consecutive first of the month, I figure I can just use freq = '1D'
date
1/1/2020
1/1/2020
1/1/2020
1/1/2020
I suppose I would have to groupby the type as well as the date, but I am not sure and I am still researching this.
Any suggestion is appreciated.
dput:
structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"1/1/2020", "1/1/2020", "2/1/2020", "2/1/2020", "2/1/2020", "2/1/2020",
"2/1/2020", "2/1/2020"), size = c(1L, 1L, 3L, 1L, 2L, 0L, 5L,
6L, 3L, 20L, 21L, 30L), type = c("a", "a", "a", "b", "b", "b",
"a", "a", "a", "b", "b", "b")), class = "data.frame", row.names = c(NA,
-12L))