I am a Ph.D. candidate trying to determine the effect of four rain gardens on stream discharge in response to precipitation depth and intensity and am not really sure how to proceed. I have been attempting this analysis for almost a year now and am not sure what I am doing. Just putting this out here to see if anyone has any advice.
I have a .csv file (shown below) containing precipitation and discharge data taken at 15-minute intervals for 12 years.
I first want to use the data from this .csv file to identify precipitation events. Precipitation events will be defined as the cumulative total of non-zero precipitation values with less than 4 hours of zero precipitation values in between. For example, considering the data provided above, precipitation began at 2010-05-13 11:30:00 and stopped 2010-05-13 21:00:00. Precipitation then began again at 2010-05-13 22:45:00 and continued until 2010-05-14 00:15:00. Because there is less than 4 hours between the time that the precipitation stopped (2010-05-13 21:00:00) and the time that the precipitation began (2010-05-13 22:45:00), these should be treated as a single event.
After precipitation events have been identified, I want to determine the duration of the event. For example, considering the data provided above, between 2010-05-13 11:30:00 and 2010-05-14 00:15:00 (my rainfall event defined in the paragraph above), there would be an event duration of 12 hr and 45 min (12.75 hr).
After determining the event duration, I want to determine the total precipitation depth that had fallen over the event duration and calculate the average rainfall intensity (calculated as total precipitation depth divided by event duration) for that event.
For each precipitation event I also want to determine the increase in stream discharge. This should be calculated as the difference between the discharge rate before precipitation began and the peak discharge rate during the precipitation event. I want to store all this data in a data object such as what is shown below.
In this data object "Event_Start_Date" is the date when the precipitation began, "Event_Duration" is the total duration of the rainfall event, "Total_Precip" is the total precipitation volume (in mm) that had fallen during the event, "Avg_Intensity" is the average rainfall intensity of the event, and "Increase_Discharge" is the increase in stream discharge from the point when the precipitation began to the peak discharge rate during the precipitation event.
Using this new data object, I want to create a plot showing the relationship between precipitation depth (mm), average precipitation intensity (mm/hr), and the increase in stream discharge (m3/s), such as what is shown in the plot below. Note that these values are made up for illustrative purposes and are not representative of my data.
I have two suggestions: One is that we should help you post your data as the output of a dput()statement, and the other is that it might be good to retitle your post as "Identifying .... - part 1" and concentrate on just the first task:
Topics are generally focused questions about issues or obstacles someone is experiencing, so this is a lot to chew on for one topic. Of course, someone out there may be willing to taclke them all, but even so, the answers would be easier to find for someone asking similar questions if they're not buried in a very long post.
To start with this one, could you post a screenshot of the code you used to create the data you did share? Unless you don't need help with dput(), in which case could you post the output?
Next, identify all the dry periods that lasted at least four hours:
dry_periods |>
# add column that enumerates the observations, place it before 'Date' column
mutate(observation = row_number(), .before = Date) |>
# find when each dry period begins and ends
group_by(dry_period) |>
mutate(
# set start value to earliest index for dry periods, NA for rainy ones
period_start = if_else(!rained, min(observation), NA),
# similarly for end value
period_end = if_else(!rained, max(observation), NA),
) |> # view()
# we no longer need to operate period-wise, so can ungroup table
ungroup() |>
# calculate length of periods (in 15-minute intervals)
mutate(period_length = period_end - period_start + 1) |>
# flag when period is at least four hours
mutate(in_long_period = period_length >= 16) |>
# flag 'dry spells' of four hours or more
mutate(in_dry_spell = in_long_period & !rained) |>
# remove any columns that are no longer needed
select(!contains(c('period'))) -> dry_spells
Now, notice that rain events are exactly the periods that fill the gaps before or after the dry spells, with one exception: if the data starts with a dry period that isn't long enough to be a dry spell, it should also be considered a dry spell so that it's not included as part of the following rain event.
dry_spells |>
# any observations that occur before any precipitation has been recorded
# should be flagged as part of a dry spell
mutate(in_dry_spell = in_dry_spell | cumsum(rained) == 0) |>
# remove any columns that are no longer needed
select(!contains(c('rain'))) -> dry_spells
dry_spells |> head()
#> # A tibble: 6 × 6
#> observation Date Time Precipitation Discharge in_dry_spell
#> <int> <chr> <chr> <dbl> <dbl> <lgl>
#> 1 1 2010-05-01 0:00:00 0 0.299 TRUE
#> 2 2 2010-05-01 0:15:00 0 0.302 TRUE
#> 3 3 2010-05-01 0:30:00 0 0.305 TRUE
#> 4 4 2010-05-01 0:45:00 0.2 0.308 FALSE
#> 5 5 2010-05-01 1:00:00 0.2 0.312 FALSE
#> 6 6 2010-05-01 1:15:00 0 0.317 FALSE
Finally, since an observation is part of a rain event exactly when it isn't part dry spell, we can apply the 'cryptid visit' approach again, this time to numbering rain events: