Tackling challenging semi-weekly or weekly data with gaps

I am working on ferry itineraries (i.e. tickets sold in a specific itinerary). Consider the scenarios:

  1. itinerary taken only on Tuesdays and Thursdays every week ( but with gaps - depending on weather)
  2. itinerary taken daily but only during 4 months a year (from June to September and again next June etc).
    How do you declare such series as ts in R? For example, in scenario (1) will it work if I set frequency=2*52?
    And how does R tackle the gaps (i.e. cancelled trips due to bad weather causing missing Tuesdays or Thursdays?)
    any feedback would be mostly appreciated. Thnx in advance

Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos

Hi,

Could you elaborate a bit more on what your ideal dataset would look like? Preferably create an example dataset and go a bit deeper into the type of analysis you want to do on it. Time series can be represented in many ways and there are many ways of analyzing it, but it all depends on the data and the goals.

Best is to create a reprex using the guide below

Good luck,
PJ

OK, yes I am a newcomer..! I am working on that and get back to u ASAP. Thnx!

Below an example of series I have (scenario 1) with an itinerary taken on Tuesdays and Thursdays but not all (i.e. cancelled due to bad weather)

ticket_semiweekly <- data.frame(stringsAsFactors=FALSE,
                                ticket = c(277, 178, 255, 368, 267, 373, 100, 120, 190, 337, 392),
                                tripdate = c("2014-12-16", "2014-12-18", "2014-12-23", "2014-12-30",
                                             "2015-01-06", "2015-01-08", "2015-01-15", "2015-01-20",
                                             "2015-01-22", "2015-01-27", "2015-01-29"),
                                day = c("Tuesday", "Thursday", "Tuesday", "Tuesday", "Tuesday",
                                        "Thursday", "Thursday", "Tuesday", "Thursday", "Tuesday",
                                        "Thursday")
)
 
# below I try to "ts" my series but I am not sure how to: 
ts(ticket_semiweekly$ticket,start=c(2014, 12, 16),freq=2*52)
#> Time Series:
#> Start = c(2014, 12) 
#> End = c(2014, 22) 
#> Frequency = 104 
#>  [1] 277 178 255 368 267 373 100 120 190 337 392

The second example below refers to a series (scenario 2), where data are available daily but only for 3 months each year. the NA here, means that I have also data available for the remaining days up to the end of each month. But again there may be gaps (cancelled or not scheduled trips)

data.frame(
    ticket = c(277, 178, 255, 368, 267, NA, 100, 120, 190, 337, 392, 200, NA,
               300, 290, 260, 370, 290, NA, NA, 120, 150, 210, 347, 395, 219,
               NA, 200, 205, 200, 390, 400, 240, NA, 340, 200, 285, 400, 300,
               260, NA, 140, 160),
    tripdate = c("2014-07-01", "2014-07-02", "2014-07-03", "2014-07-04",
                 "2014-07-05", NA, "2014-07-31", "2014-08-01", "2014-08-02",
                 "2014-08-03", "2014-08-04", "2014-08-05", NA, "2014-08-30",
                 "2014-09-01", "2014-09-02", "2014-09-03", "2014-09-04", "2014-09-05",
                 NA, "2014-09-30", "2015-07-01", "2015-07-02", "2015-07-03",
                 "2015-07-04", "2015-07-05", NA, "2015-07-31", "2015-08-01",
                 "2015-08-02", "2015-08-03", "2015-08-04", "2015-08-05", NA,
                 "2015-08-30", "2015-09-01", "2014-08-05", "2015-09-03", "2015-09-04",
                 "2015-09-01", NA, "2015-09-30", "2016-07-01")
)
#>    ticket   tripdate
#> 1     277 2014-07-01
#> 2     178 2014-07-02
#> 3     255 2014-07-03
#> 4     368 2014-07-04
#> 5     267 2014-07-05
#> 6      NA       <NA>
#> 7     100 2014-07-31
#> 8     120 2014-08-01
#> 9     190 2014-08-02
#> 10    337 2014-08-03
#> 11    392 2014-08-04
#> 12    200 2014-08-05
#> 13     NA       <NA>
#> 14    300 2014-08-30
#> 15    290 2014-09-01
#> 16    260 2014-09-02
#> 17    370 2014-09-03
#> 18    290 2014-09-04
#> 19     NA 2014-09-05
#> 20     NA       <NA>
#> 21    120 2014-09-30
#> 22    150 2015-07-01
#> 23    210 2015-07-02
#> 24    347 2015-07-03
#> 25    395 2015-07-04
#> 26    219 2015-07-05
#> 27     NA       <NA>
#> 28    200 2015-07-31
#> 29    205 2015-08-01
#> 30    200 2015-08-02
#> 31    390 2015-08-03
#> 32    400 2015-08-04
#> 33    240 2015-08-05
#> 34     NA       <NA>
#> 35    340 2015-08-30
#> 36    200 2015-09-01
#> 37    285 2014-08-05
#> 38    400 2015-09-03
#> 39    300 2015-09-04
#> 40    260 2015-09-01
#> 41     NA       <NA>
#> 42    140 2015-09-30
#> 43    160 2016-07-01

In both cases, my problems are how to take into account gaps and how to declare series as time series
The goal is to make forecast for the coming Tuesdays and Thursdays (scenario 1) or the the days (or weeks) of those months next year (scenario 2).

I hope it is clear now, although a lengthy message

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.