Hello,
I have a doubt using a tibble.
For example, I have a date column.
And I have a zone column.
A zone can be repeated many times.
I try to use something as:
big_data <- big_data_0 %>%
mutate(datef= as.Date(paste(year, month, "01", sep = "-"))) %>%
group_by(zone, datef) %>%
summarise(n = n()) %>%
as_tsibble(index = datef)
But It fails. The reason I'm doing, or trying to do this, is because I have some kind of series.
I need to process tables using the datef variables, and using a second one to perform counting, compute mean and so on. I don't like the idea to declare datef as_factor. Besides, tsibble can perform many analysis and fast.
I receive the error with the code I wrote that I have duplicated rows. And that's where I get confused.
Grouping by datef and zone generates 1 row by that combination (datef and zone).
So, I'm definitely doing something wrong.
Here I provide some data to replicate the error:
df_complete <- expand.grid(
year = 2010:2024,
month = 1:12,
zone = 1:20
)
df <- df_complete %>%
sample_n(size = 6000, replace = TRUE)
df %>%
mutate(date=as.Date(paste(year, month, "01", sep = "-"))) %>%
group_by(date, zone) %>%
count() %>%
as_tsibble()