Representing partial-year time series

danpelota · April 20, 2020, 12:43pm

I'm new to time series modeling (working through the Hyndman/ Athanasopoulos book) and trying to understand how to represent a non-continuous dataset in R.

I have a dataset of daily absentee ballot returns in an American election. The data is only relevant for about 50 days before the election date, which is always on a Tuesday.

There are four "cycles" of data, each representing a year: 2014, 2016, 2018, and 2020. For each year, there are 51 continuous days' worth of data, indexed starting at -50 (50 days before election day) and ending at 0 (election day).

There is a natural 7-day frequency since some geographies do not count ballots on weekends.

The cumulative number of votes looks like this:

votes <- read.csv('votes.csv')
ggplot(votes, 
       aes(x=days_to_election, y=total_votes, color=factor(cycle))) +
  geom_line()

My goal is to explore trends and potentially forecast votes for 2020 based on the previous 4 cycles, but I'm not sure how to represent this dataset since it's not continuous across cycles. Does it make sense to represent this as a single dataset with a frequency of 51 (51 days, including day 0), like this?

vote.ts <- ts(votes$total_votes, start=-50, frequency=51)
ggseasonplot(vote.ts)

My only hesitation is that day "0" of a previous cycle has no relationship to day "-50" of the next cycle and I'm wondering if there's a better way to represent the natural 7-day frequency inherent in the data as well when trying to forecast daily returns (not cumulative).

In other words, there are nested frequencies of "51" (number of days in a cycle) and "7" (days of the week). Or is there a better way to approach this dataset, without thinking of a 51-day cycle as the frequency?

system · May 11, 2020, 12:43pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.