The base dataset - "pedestrian" from the tsibble package:
pedestrian
#> # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
#> # Key: Sensor [4]
#> Sensor Date_Time Date Time Count
#> <chr> <dttm> <date> <int> <int>
#> 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
#> 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
#> 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
#> 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
#> 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
#> 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
#> 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
#> 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
#> 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
#> 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
#> # ℹ 66,027 more rows
Created on 2024-08-20 with reprex v2.1.0
I want to calculate the hourly average of pedestrians (column Count) on weekly basis ; - the average number of pedestrians for each week in hour 0, 1, 2, etc... The result should be plotted by gg_season()
with flag period = "day"
I am able to do the summary and calculate the average like this:
df <- pedestrian %>%
# Just simplifying data
filter(lubridate::year(Date) == 2015 & Sensor == "Bourke Street Mall (North)") %>%
fill_gaps() %>%
group_by(Time) %>%
index_by(yrweek = yearweek(Date_Time)) %>%
summarise(avg = mean(Count)) %>%
arrange(yrweek, Time)
df
#> # A tsibble: 1,105 x 3 [1W]
#> # Key: Time [25]
#> Time yrweek avg
#> <int> <week> <dbl>
#> 1 0 2015 W08 419.
#> 2 1 2015 W08 351
#> 3 2 2015 W08 192
#> 4 3 2015 W08 150.
#> 5 4 2015 W08 106.
#> 6 5 2015 W08 82
#> 7 6 2015 W08 94.2
#> 8 7 2015 W08 240.
#> 9 8 2015 W08 566.
#> 10 9 2015 W08 756.
#> # ℹ 1,095 more rows
Created on 2024-08-20 with reprex v2.1.0
In next step want to plot the hourly average data for weeks like this:
df %>% gg_season(avg, period = "day")
I guess I need to change the "yrweek" column to add there also the hours.
How to achieve this please?
Thanks...