Plotting a Time Series in RStudio

brant · August 11, 2023, 6:48pm

I am trying to plot a time series of daily discharge from 2010 to 2022, however, the plot does not look correct.

I first created a variable for discharge using the code: Discharge <- PrecipDischarge.dat$Discharge

I then created a time series using the code:

discharge.ts <- ts(Discharge, start = c(2010,5), end = c(2022,10), frequency = 184)

I then plotted the discharge time series using the code:

plot.ts(discharge.ts, xlab = "Date", ylab = "Daily Discharge (m3/s)"

When I plot each of these time series, the time series does not appear to reflect my data, and is following a cyclic pattern (i.e., the section of the time plot from 2012 to 2013 is identical to the section of the time plot from 2018 to 2019, the section of the time plot from 2013 to 2014 is identical to the section of the time plot from 2019 to 2020).

I have plotted the same data is excel using a line graph, and the data and the plot looks like it is correct. Why am I unable to produce these plots in R?

I tried looking at the raw data to ensure that this cyclic pattern isn't actually present in the time series, and plotted the data in Excel to see if I got a similar plot. The excel plot appears correct, and very different from my results in R.

Does anyone know why the R plot is not representing the true data?

TIA

technocrat · August 11, 2023, 7:23pm

Is this because data is every other day? Could you add the data in a reprex?

brant · August 11, 2023, 7:53pm

Frequency = 184 because the data collected for each year spans from May 1 through October 31 (184 days). Every 184 days is essentially another cycle. What do you mean by a reprex? I am new to R.

technocrat · August 11, 2023, 9:17pm

A reprex (see the FAQ is a minimum reproducible example. This is simple enough that it's not strictly necessary, but I need to understand whether Discharge records anything for the off-season—either NA or 0.

The frequency should have no effect other than changing the scaling of the x-axis.

# example data
data("varve",package = "astsa")
plot(varve)

faux <- ts(as.vector(varve), start = c(2010,5))
plot(faux)
faux <- ts(as.vector(varve), start = c(2010,5), frequency = 1)
plot(faux)

faux <- ts(as.vector(varve), start = c(2010,5), frequency = 365)
plot(faux)

faux <- ts(as.vector(varve), start = c(2010,5), frequency = 184)
plot(faux)

^{Created on 2023-08-11 with reprex v2.0.2}

brant · August 12, 2023, 12:23am

I have included a portion of my data (note that I have removed and redacted some data to ensure the privacy of my data).

PrecipDischarge.dat
Date Discharge Precipitation DailyChange
1 2010-05-01 0.941 2.8 1.000e-11
2 2010-05-02 0.528 28.8 -4.130e-01
8 2010-05-08 0.500 0.0 -1.390e-01
9 2010-05-09 0.381 0.0 -1.190e-01
10 2010-05-10 0.320 0.0 -6.100e-02
11 2010-05-11 0.280 0.0 -4.000e-02
12 2010-05-12 0.250 0.0 -3.000e-02
20 2010-05-20 0.289 0.0 -2.160e-01
21 2010-05-21 0.248 0.0 -4.100e-02
31 2010-05-31 0.115 0.2 -4.700e-02
37 2010-06-06 0.272 0.0 1.000e-11
38 2010-06-07 0.342 0.0 7.000e-02
39 2010-06-08 1.764 0.0 1.422e+00
52 2010-06-21 0.144 0.6 -2.700e-02
53 2010-06-22 0.116 0.0 -2.800e-02
54 2010-06-23 1.255 15.8 1.139e+00
55 2010-06-24 1.002 0.0 -2.530e-01
56 2010-06-25 0.250 0.0 -7.520e-01
57 2010-06-26 0.181 0.0 -6.900e-02
58 2010-06-27 2.820 24.0 2.639e+00
59 2010-06-28 0.329 0.0 -2.491e+00
60 2010-06-29 0.282 0.0 -4.700e-02
114 2010-08-22 0.061 0.0 -3.600e-02
115 2010-08-23 0.042 0.0 -1.900e-02
116 2010-08-24 0.035 0.0 -7.000e-03
124 2010-09-01 0.034 0.0 -7.000e-03
125 2010-09-02 0.096 4.8 6.200e-02
126 2010-09-03 1.820 34.4 1.724e+00
139 2010-09-16 0.113 0.2 -2.100e-02
140 2010-09-17 0.203 3.8 9.000e-02
141 2010-09-18 0.201 0.0 -2.000e-03
142 2010-09-19 0.145 0.0 -5.600e-02
143 2010-09-20 0.111 0.0 -3.400e-02
170 2010-10-17 0.088 0.0 0.000e+00
171 2010-10-18 0.078 0.4 -1.000e-02
172 2010-10-19 0.076 0.0 -2.000e-03
173 2010-10-20 0.076 0.2 0.000e+00
174 2010-10-21 0.074 0.0 -2.000e-03
194 2011-05-10 0.018 0.0 -8.000e-03
195 2011-05-11 0.014 0.0 -4.000e-03
196 2011-05-12 0.036 0.0 2.200e-02
236 2011-06-21 0.045 0.0 -5.500e-02
237 2011-06-22 0.035 22.6 -1.000e-02
238 2011-06-23 0.041 1.8 6.000e-03
243 2011-06-28 0.068 0.0 -4.200e-02
244 2011-06-29 0.070 0.0 2.000e-03
245 2011-06-30 0.022 0.0 -4.800e-02
246 2011-07-01 0.135 5.2 1.130e-01
247 2011-07-02 0.112 0.8 -2.300e-02
248 2011-07-03 0.101 0.0 -1.100e-02
249 2011-07-04 0.104 12.8 3.000e-03
250 2011-07-05 0.074 5.2 -3.000e-02

The above data has been read into R using the read.csv() command and I named the object PrecipDischarge.dat, as reflected in the code from my previous post. I am not sure why the data is plotting the way that it is. The data is not perfectly cyclical as the plot suggests. Any ideas why it is plotting is a cyclical fashion and not representing the true data?

Thanks.

technocrat · August 12, 2023, 10:24am

These data have frequent gaps. I expected that there would be daily observations during the 184 day period, and I can't reproduce your plot, although I have similar.

What's happening is that ts() is taking the 52 observations and recycling them to make 2,214 to fill out the start and end dates provided.

brant · August 12, 2023, 2:42pm

Ok, thank you. Do you know if there is any way of avoiding the ts() recycling the observations. I want to produce a single time series from 2010 to 2022, that only includes the months of May through October of each year. For example, after October 31, 2010, I want the time series to skip to May 1, 2011, after October 31, 2011, I want the time series to skip to May 1, 2012...after October 31, 2021, I want the time series to skip to May 1, 2022.

Is there any way of reproducing my data and essentially cutting the months of November through April out of the dataset?

brant · August 13, 2023, 3:00pm

Hi everyone, I am just wondering if there is any way of producing a time series that contains frequent gaps. Since I am dealing with rainfall and streamflow data, I want to cut out the winter months and deal with summer (May through October) months only.

When I did so, the time series is taking the first 52 observations and repeating them over and over. Is there any way of creating a time series that goes from May through October of one year, then skips to May through October of the next year, without recycling previous data to fill the gaps in the start and end dates of the time series?

TIA

technocrat · August 15, 2023, 6:44am

Consider plotting each season separately and combining for display purposes with {patchwork} Give the frequency as 365 and start as c(year,5) and it will not try to plot Jan-Apr and Dec.

nirgrahamuk · August 15, 2023, 11:46am

starting up top with your issue;
where your time is unevenely sampled, plotting with ggplot where you have a true date field on the axis, will simply place the points in the correct locations, i.e. doesnt rely like conventional ts/plot.ts on a rigid sampling regime. i.e.

library(tidyverse)
some_data <- readr::read_delim(file = I(
  "Date Discharge Precipitation DailyChange
2010-05-01 0.941 2.8 1.000e-11
2010-05-02 0.528 28.8 -4.130e-01
2010-05-08 0.500 0.0 -1.390e-01
2010-05-09 0.381 0.0 -1.190e-01
2010-05-10 0.320 0.0 -6.100e-02
2010-05-11 0.280 0.0 -4.000e-02
2010-05-12 0.250 0.0 -3.000e-02
2010-05-20 0.289 0.0 -2.160e-01
2010-05-21 0.248 0.0 -4.100e-02
2010-05-31 0.115 0.2 -4.700e-02
2010-06-06 0.272 0.0 1.000e-11
2010-06-07 0.342 0.0 7.000e-02
2010-06-08 1.764 0.0 1.422e+00
2010-06-21 0.144 0.6 -2.700e-02
2010-06-22 0.116 0.0 -2.800e-02
2010-06-23 1.255 15.8 1.139e+00
2010-06-24 1.002 0.0 -2.530e-01
2010-06-25 0.250 0.0 -7.520e-01
2010-06-26 0.181 0.0 -6.900e-02
2010-06-27 2.820 24.0 2.639e+00
2010-06-28 0.329 0.0 -2.491e+00
2010-06-29 0.282 0.0 -4.700e-02
2010-08-22 0.061 0.0 -3.600e-02
2010-08-23 0.042 0.0 -1.900e-02
2010-08-24 0.035 0.0 -7.000e-03
2010-09-01 0.034 0.0 -7.000e-03
2010-09-02 0.096 4.8 6.200e-02
2010-09-03 1.820 34.4 1.724e+00
2010-09-16 0.113 0.2 -2.100e-02
2010-09-17 0.203 3.8 9.000e-02
2010-09-18 0.201 0.0 -2.000e-03
2010-09-19 0.145 0.0 -5.600e-02
2010-09-20 0.111 0.0 -3.400e-02
2010-10-17 0.088 0.0 0.000e+00
2010-10-18 0.078 0.4 -1.000e-02
2010-10-19 0.076 0.0 -2.000e-03
2010-10-20 0.076 0.2 0.000e+00
2010-10-21 0.074 0.0 -2.000e-03
2011-05-10 0.018 0.0 -8.000e-03
2011-05-11 0.014 0.0 -4.000e-03
2011-05-12 0.036 0.0 2.200e-02
2011-06-21 0.045 0.0 -5.500e-02
2011-06-22 0.035 22.6 -1.000e-02
2011-06-23 0.041 1.8 6.000e-03
2011-06-28 0.068 0.0 -4.200e-02
2011-06-29 0.070 0.0 2.000e-03
2011-06-30 0.022 0.0 -4.800e-02
2011-07-01 0.135 5.2 1.130e-01
2011-07-02 0.112 0.8 -2.300e-02
2011-07-03 0.101 0.0 -1.100e-02
2011-07-04 0.104 12.8 3.000e-03
2011-07-05 0.074 5.2 -3.000e-02"
), delim = " ")

# ggplot chart
select(
  some_data,
  Date, Discharge
) |> ggplot(aes(
  x = Date,
  y = Discharge
)) +
  geom_point() +
  geom_line() +
  geom_smooth()

what about conventional ts/plot.ts ; well you could pad your vector so as it has explicit missing entries, and is therefore sampled in a regular way at least as far as ts/plot.ts is concerned.


# date range
(dr <- range(some_data$Date))
full_seq <- seq(
  from = min(dr),
  to = max(dr),
  by = 1L
)

(vec_info <- enframe(full_seq,
  value = "Date"
) |>
  left_join(some_data) |>
  select(
    Date,
    Discharge
  ) |> deframe()
)

discharge.ts <- ts(vec_info, frequency = 1)
plot.ts(discharge.ts, xlab = "Date", ylab = "Daily Discharge (m3/s)")

obviously this isnt beautiful given the sheer volume of absent data, but you could come up with ways of filling the gaps, such as line fittings / smoothing functions and the like.

Malizgani · August 17, 2023, 10:38pm

Hi Brant
I am seeing your issue and its exactly the same issue I am experiencing with my time series data. How have you managed to overcome missing months for the five-year period. I have a similar surveillance system that collects data only from May to October every year. It would be interesting to understand how you proceeded.

Cheers Maliz

technocrat · August 18, 2023, 12:41am

I misapprehended the original question by implicitly assuming that the problem was only that November-April data was missing. If I now understand correctly. Not only is the data collected only during the May-October season but is is also collected only on days with precipitation and the following day.

It is possible to impute missing data but not when such a huge proportion is missing. It's also possible to use methods to deal with irregular time series, but it is doubtful they will be of use in the cases where the regularities are separately by very long intervals. Here, it just doesn't rain all the time and there's nothing to be done about that.

Let's step back and ask why we do time series analysis on regular and more of less complete data in the first place.

To begin, suppose we did have complete data of precipitation and discharge and we are interested in the increase of discharge in day 3 over day 1, given the precipitation in day 2. So, more rain, more flow, make sense. But how much? The natural instinct is ordinary least squares regression, which is often unreasonably effective. But only if we are careful to observe its limitations. What will probably bite first is violation of the assumption of normality of the residuals. Fortunately, there is the arsenal of time series algorithms with varying degrees of effectiveness dealing with autocorrelation.

That's the visible part of the iceberg. Below the surface is the possibility that one or more processes generating the variation is unknown and analyzing the variation may provide some insight into those processes. But wait. We have a really quite good well-tried model of riparian processes: Discharge is proportional to watershed precipitation plus down gradient groundwater inflow net of aquifer recharge less soil moisture replenishment less evapo-transpiration less water supply withdrawal plus wastewater discharge. Plus, of course, random variation.

But wait again. There are few watersheds for which we as hydrologists are so fortunate as to possess hard data on all these factors.

Enter Reverend Bayes. We may not know the relative contributions of the hidden processes to the resulting discharge but we can estimate Bayesian priors and calculate posteriors. There are several methods, including

Bayesian Model Averaging (BMA) with multiple prior structures: This approach is used for rainfall-runoff modeling and combines different prior structures to improve prediction accuracy1.
Hybrid Bayesian Watershed Modeling: This model assesses interannual variability in nitrogen sourcing and retention in watersheds, providing insights into the environmental impact of nitrogen pollution2.
Hierarchical Bayesian Model: This model accounts for spatial and temporal structures in discharge and concentration data, allowing for the detection of the effects of stormwater control measures on watershed discharge3.
Data Transformation in Bayesian Inference: This approach evaluates the effects of data transformations on Bayesian inference of watershed discharge, aiming to improve the understanding of hydrological behavior4.
Bayesian and Physics-Informed Machine Learning Models: These models are used for streamflow simulation in data-scarce basins, addressing the challenges of data quality and availability in rural watersheds5. 6
.

system · September 29, 2023, 12:41am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.