Identifying and Determining the Average Rainfall Intensity of Precipitation Events and Associated Increases in Streamflow Part 1

Hi,

This is a follow-up to my previous post Plotting the Relationship Between Precipitation Depth, Precipitation Intensity, and Increase in Streamflow - General - Posit Community.

I am a Ph.D. candidate trying to determine the effect of four rain gardens on stream discharge in response to precipitation depth and intensity and am not really sure how to proceed. I have been attempting this analysis for almost a year now and am not sure what I am doing. Just putting this out here to see if anyone has any advice.

I have a .csv file (shown below) containing precipitation and discharge data taken at 15-minute intervals for 12 years.

Date, Time, Precipitation, Discharge
2010-05-13,10:30:00,0,0.224
2010-05-13,10:45:00,0,0.225
2010-05-13,11:00:00,0,0.225
2010-05-13,11:15:00,0,0.225
2010-05-13,11:30:00,0.2,0.225
2010-05-13,11:45:00,0.2,0.225
2010-05-13,12:00:00,0.6,0.226
2010-05-13,12:15:00,0.6,0.226
2010-05-13,12:30:00,0.2,0.226
2010-05-13,12:45:00,0.4,0.226
2010-05-13,13:00:00,0.2,0.227
2010-05-13,13:15:00,0.4,0.235
2010-05-13,13:30:00,0.4,0.243
2010-05-13,13:45:00,0.4,0.256
2010-05-13,14:00:00,0.8,0.273
2010-05-13,14:15:00,0.4,0.294
2010-05-13,14:30:00,0.2,0.312
2010-05-13,14:45:00,0.2,0.33
2010-05-13,15:00:00,0.2,0.334
2010-05-13,15:15:00,0,0.338
2010-05-13,15:30:00,0.2,0.356
2010-05-13,15:45:00,0,0.356
2010-05-13,16:00:00,0,0.356
2010-05-13,16:15:00,0.2,0.356
2010-05-13,16:30:00,0.6,0.356
2010-05-13,16:45:00,0.2,0.356
2010-05-13,17:00:00,0.2,0.343
2010-05-13,17:15:00,0.4,0.343
2010-05-13,17:30:00,1.4,0.343
2010-05-13,17:45:00,0.6,0.374
2010-05-13,18:00:00,0.8,0.409
2010-05-13,18:15:00,0.8,0.5
2010-05-13,18:30:00,0.6,0.55
2010-05-13,18:45:00,0.4,0.602
2010-05-13,19:00:00,0.2,0.697
2010-05-13,19:15:00,0.2,0.733
2010-05-13,19:30:00,0.4,0.77
2010-05-13,19:45:00,0.2,0.777
2010-05-13,20:00:00,0,0.748
2010-05-13,20:15:00,0.2,0.711
2010-05-13,20:30:00,0.2,0.682
2010-05-13,20:45:00,0.2,0.634
2010-05-13,21:00:00,0.2,0.601
2010-05-13,21:15:00,0,0.582
2010-05-13,21:30:00,0,0.562
2010-05-13,21:45:00,0,0.558
2010-05-13,22:00:00,0,0.554

I first want to use the data from this .csv file to identify precipitation events. Precipitation events will be defined as the cumulative total of non-zero precipitation values with less than 4 hours of zero precipitation values in between. For example, considering the data provided above, precipitation began at 2010-05-13 11:30:00 and stopped 2010-05-13 21:00:00. Precipitation then began again at 2010-05-13 22:45:00 and continued until 2010-05-14 00:15:00. Because there is less than 4 hours between the time that the precipitation stopped (2010-05-13 21:00:00) and the time that the precipitation began (2010-05-13 22:45:00), these should be treated as a single event.

After precipitation events have been identified, I want to determine the duration of the event. For example, considering the data provided above, between 2010-05-13 11:30:00 and 2010-05-14 00:15:00 (my rainfall event defined in the paragraph above), there would be an event duration of 12 hr and 45 min (12.75 hr).

After determining the event duration, I want to determine the total precipitation depth that had fallen over the event duration and calculate the average rainfall intensity (calculated as total precipitation depth divided by event duration) for that event.

For each precipitation event I also want to determine the increase in stream discharge. This should be calculated as the difference between the discharge rate before precipitation began and the peak discharge rate during the precipitation event. I want to store all this data in a data object such as what is shown below.

Event_Start_Date, Event_Duration, Total_Precip, Avg_Intensity, Increase_Discharge

In this data object "Event_Start_Date" is the date when the precipitation began, "Event_Duration" is the total duration of the rainfall event, "Total_Precip" is the total precipitation volume (in mm) that had fallen during the event, "Avg_Intensity" is the average rainfall intensity of the event, and "Increase_Discharge" is the increase in stream discharge from the point when the precipitation began to the peak discharge rate during the precipitation event.

Using this new data object, I want to create a plot showing the relationship between precipitation depth (mm), average precipitation intensity (mm/hr), and the increase in stream discharge (m3/s), such as what is shown in the plot below. Note that these values are made up for illustrative purposes and are not representative of my data.

Does anyone have any advice as to how to illustrate this and determine the regression slope of each line? TIA.

Hi Brant,

I have two suggestions: One is that we should help you post your data as the output of a dput()statement, and the other is that it might be good to retitle your post as "Identifying .... - part 1" and concentrate on just the first task:

Topics are generally focused questions about issues or obstacles someone is experiencing, so this is a lot to chew on for one topic. Of course, someone out there may be willing to taclke them all, but even so, the answers would be easier to find for someone asking similar questions if they're not buried in a very long post.

To start with this one, could you post a screenshot of the code you used to create the data you did share? Unless you don't need help with dput(), in which case could you post the output?

Here is an approach that applies the code from the post How to enumerate intervals in a sequence, like periods when cryptids come to visit - #8 by dromano to enumerate both dry periods and rain events, using the data shared by you here. (Full reprex can be found at the end of this post.)

dput() output, with table saved as 'water_data' (click to access)
structure(list(Date = c("2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-02", "2010-05-02", 
"2010-05-02", "2010-05-02"), Time = c("0:00:00", "0:15:00", "0:30:00", 
"0:45:00", "1:00:00", "1:15:00", "1:30:00", "1:45:00", "2:00:00", 
"2:15:00", "2:30:00", "2:45:00", "3:00:00", "3:15:00", "3:30:00", 
"3:45:00", "4:00:00", "4:15:00", "4:30:00", "4:45:00", "5:00:00", 
"5:15:00", "5:30:00", "5:45:00", "6:00:00", "6:15:00", "6:30:00", 
"6:45:00", "7:00:00", "7:15:00", "7:30:00", "7:45:00", "8:00:00", 
"8:15:00", "8:30:00", "8:45:00", "9:00:00", "9:15:00", "9:30:00", 
"9:45:00", "10:00:00", "10:15:00", "10:30:00", "10:45:00", "11:00:00", 
"11:15:00", "11:30:00", "11:45:00", "12:00:00", "12:15:00", "12:30:00", 
"12:45:00", "13:00:00", "13:15:00", "13:30:00", "13:45:00", "14:00:00", 
"14:15:00", "14:30:00", "14:45:00", "15:00:00", "15:15:00", "15:30:00", 
"15:45:00", "16:00:00", "16:15:00", "16:30:00", "16:45:00", "17:00:00", 
"17:15:00", "17:30:00", "17:45:00", "18:00:00", "18:15:00", "18:30:00", 
"18:45:00", "19:00:00", "19:15:00", "19:30:00", "19:45:00", "20:00:00", 
"20:15:00", "20:30:00", "20:45:00", "21:00:00", "21:15:00", "21:30:00", 
"21:45:00", "22:00:00", "22:15:00", "22:30:00", "22:45:00", "23:00:00", 
"23:15:00", "23:30:00", "23:45:00", "0:00:00", "0:15:00", "0:30:00", 
"0:45:00"), Precipitation = c(0, 0, 0, 0.2, 0.2, 0, 0, 0.2, 0.4, 
0, 0, 0.2, 0, 0, 0, 0, 0.2, 0, 0, 0.6, 0.4, 0.2, 0, 0, 0, 0, 
0, 0, 0, 0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0), Discharge = c(0.299, 0.302, 
0.305, 0.308, 0.312, 0.317, 0.321, 0.326, 0.33, 0.339, 0.352, 
0.38, 0.41, 0.428, 0.424, 0.419, 0.415, 0.41, 0.411, 0.412, 0.414, 
0.415, 0.416, 0.44, 0.459, 0.465, 0.495, 0.495, 0.495, 0.495, 
0.495, 0.495, 0.471, 0.453, 0.44, 0.431, 0.422, 0.423, 0.423, 
0.424, 0.424, 0.425, 0.425, 0.426, 0.427, 0.427, 0.428, 0.428, 
0.429, 0.429, 0.43, 0.43, 0.431, 0.431, 0.432, 0.432, 0.433, 
0.433, 0.434, 0.434, 0.435, 0.441, 0.471, 0.527, 0.565, 0.562, 
0.559, 0.556, 0.552, 0.665, 0.892, 0.941, 0.937, 0.933, 0.928, 
0.924, 0.843, 0.812, 0.765, 0.729, 0.693, 0.672, 0.644, 0.617, 
0.611, 0.591, 0.578, 0.569, 0.559, 0.55, 0.54, 0.538, 0.536, 
0.534, 0.532, 0.53, 0.528, 0.526, 0.524, 0.521)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -100L)) -> water_data

First, enumerate all the contiguous periods without precipitation:

library(tidyverse)

water_data |> 
  # flag when it rained
  mutate(rained = Precipitation > 0) |> 
  # apply 'cryptid visit' approach to numbering dry periods
  mutate(dry_period = if_else(!rained, cumsum(rained), NA)) |> 
  mutate(dry_period = factor(dry_period) |> as.numeric()) -> dry_periods

dry_periods |> head()
#> # A tibble: 6 × 6
#>   Date       Time    Precipitation Discharge rained dry_period
#>   <chr>      <chr>           <dbl>     <dbl> <lgl>       <dbl>
#> 1 2010-05-01 0:00:00           0       0.299 FALSE           1
#> 2 2010-05-01 0:15:00           0       0.302 FALSE           1
#> 3 2010-05-01 0:30:00           0       0.305 FALSE           1
#> 4 2010-05-01 0:45:00           0.2     0.308 TRUE           NA
#> 5 2010-05-01 1:00:00           0.2     0.312 TRUE           NA
#> 6 2010-05-01 1:15:00           0       0.317 FALSE           2

Next, identify all the dry periods that lasted at least four hours:

dry_periods |> 
  # add column that enumerates the observations, place it before 'Date' column
  mutate(observation = row_number(), .before = Date) |> 
  # find when each dry period begins and ends
  group_by(dry_period) |> 
  mutate(
    # set start value to earliest index for dry periods, NA for rainy ones
    period_start = if_else(!rained, min(observation), NA),
    # similarly for end value
    period_end = if_else(!rained, max(observation), NA),
    ) |> # view()
  # we no longer need to operate period-wise, so can ungroup table
  ungroup() |> 
  # calculate length of periods (in 15-minute intervals)
  mutate(period_length = period_end - period_start + 1) |> 
  # flag when period is at least four hours
  mutate(in_long_period = period_length >= 16) |>
  # flag 'dry spells' of four hours or more
  mutate(in_dry_spell = in_long_period & !rained) |> 
  # remove any columns that are no longer needed
  select(!contains(c('period'))) ->  dry_spells
dry_spells |> head()
#> # A tibble: 6 × 7
#>   observation Date       Time    Precipitation Discharge rained in_dry_spell
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>  <lgl>       
#> 1           1 2010-05-01 0:00:00           0       0.299 FALSE  FALSE       
#> 2           2 2010-05-01 0:15:00           0       0.302 FALSE  FALSE       
#> 3           3 2010-05-01 0:30:00           0       0.305 FALSE  FALSE       
#> 4           4 2010-05-01 0:45:00           0.2     0.308 TRUE   FALSE       
#> 5           5 2010-05-01 1:00:00           0.2     0.312 TRUE   FALSE       
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE  FALSE

Now, notice that rain events are exactly the periods that fill the gaps before or after the dry spells, with one exception: if the data starts with a dry period that isn't long enough to be a dry spell, it should also be considered a dry spell so that it's not included as part of the following rain event.

dry_spells |> 
  # any observations that occur before any precipitation has been recorded
  # should be flagged as part of a dry spell 
  mutate(in_dry_spell = in_dry_spell | cumsum(rained) == 0) |> 
  # remove any columns that are no longer needed
  select(!contains(c('rain'))) ->  dry_spells

dry_spells |> head()
#> # A tibble: 6 × 6
#>   observation Date       Time    Precipitation Discharge in_dry_spell
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>       
#> 1           1 2010-05-01 0:00:00           0       0.299 TRUE        
#> 2           2 2010-05-01 0:15:00           0       0.302 TRUE        
#> 3           3 2010-05-01 0:30:00           0       0.305 TRUE        
#> 4           4 2010-05-01 0:45:00           0.2     0.308 FALSE       
#> 5           5 2010-05-01 1:00:00           0.2     0.312 FALSE       
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE

Finally, since an observation is part of a rain event exactly when it isn't part dry spell, we can apply the 'cryptid visit' approach again, this time to numbering rain events:

dry_spells |> 
  mutate(rain_event = if_else(!in_dry_spell, cumsum(in_dry_spell), NA)) |> 
  mutate(rain_event = factor(rain_event) |> as.numeric()) -> rain_events

rain_events |> head()
#> # A tibble: 6 × 7
#>   observation Date       Time    Precipitation Discharge in_dry_spell rain_event
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>             <dbl>
#> 1           1 2010-05-01 0:00:00           0       0.299 TRUE                 NA
#> 2           2 2010-05-01 0:15:00           0       0.302 TRUE                 NA
#> 3           3 2010-05-01 0:30:00           0       0.305 TRUE                 NA
#> 4           4 2010-05-01 0:45:00           0.2     0.308 FALSE                 1
#> 5           5 2010-05-01 1:00:00           0.2     0.312 FALSE                 1
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE                 1

Created on 2024-04-15 with reprex v2.0.2

full reprex
structure(list(Date = c("2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", "2010-05-01", 
"2010-05-01", "2010-05-01", "2010-05-01", "2010-05-02", "2010-05-02", 
"2010-05-02", "2010-05-02"), Time = c("0:00:00", "0:15:00", "0:30:00", 
"0:45:00", "1:00:00", "1:15:00", "1:30:00", "1:45:00", "2:00:00", 
"2:15:00", "2:30:00", "2:45:00", "3:00:00", "3:15:00", "3:30:00", 
"3:45:00", "4:00:00", "4:15:00", "4:30:00", "4:45:00", "5:00:00", 
"5:15:00", "5:30:00", "5:45:00", "6:00:00", "6:15:00", "6:30:00", 
"6:45:00", "7:00:00", "7:15:00", "7:30:00", "7:45:00", "8:00:00", 
"8:15:00", "8:30:00", "8:45:00", "9:00:00", "9:15:00", "9:30:00", 
"9:45:00", "10:00:00", "10:15:00", "10:30:00", "10:45:00", "11:00:00", 
"11:15:00", "11:30:00", "11:45:00", "12:00:00", "12:15:00", "12:30:00", 
"12:45:00", "13:00:00", "13:15:00", "13:30:00", "13:45:00", "14:00:00", 
"14:15:00", "14:30:00", "14:45:00", "15:00:00", "15:15:00", "15:30:00", 
"15:45:00", "16:00:00", "16:15:00", "16:30:00", "16:45:00", "17:00:00", 
"17:15:00", "17:30:00", "17:45:00", "18:00:00", "18:15:00", "18:30:00", 
"18:45:00", "19:00:00", "19:15:00", "19:30:00", "19:45:00", "20:00:00", 
"20:15:00", "20:30:00", "20:45:00", "21:00:00", "21:15:00", "21:30:00", 
"21:45:00", "22:00:00", "22:15:00", "22:30:00", "22:45:00", "23:00:00", 
"23:15:00", "23:30:00", "23:45:00", "0:00:00", "0:15:00", "0:30:00", 
"0:45:00"), Precipitation = c(0, 0, 0, 0.2, 0.2, 0, 0, 0.2, 0.4, 
0, 0, 0.2, 0, 0, 0, 0, 0.2, 0, 0, 0.6, 0.4, 0.2, 0, 0, 0, 0, 
0, 0, 0, 0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0), Discharge = c(0.299, 0.302, 
0.305, 0.308, 0.312, 0.317, 0.321, 0.326, 0.33, 0.339, 0.352, 
0.38, 0.41, 0.428, 0.424, 0.419, 0.415, 0.41, 0.411, 0.412, 0.414, 
0.415, 0.416, 0.44, 0.459, 0.465, 0.495, 0.495, 0.495, 0.495, 
0.495, 0.495, 0.471, 0.453, 0.44, 0.431, 0.422, 0.423, 0.423, 
0.424, 0.424, 0.425, 0.425, 0.426, 0.427, 0.427, 0.428, 0.428, 
0.429, 0.429, 0.43, 0.43, 0.431, 0.431, 0.432, 0.432, 0.433, 
0.433, 0.434, 0.434, 0.435, 0.441, 0.471, 0.527, 0.565, 0.562, 
0.559, 0.556, 0.552, 0.665, 0.892, 0.941, 0.937, 0.933, 0.928, 
0.924, 0.843, 0.812, 0.765, 0.729, 0.693, 0.672, 0.644, 0.617, 
0.611, 0.591, 0.578, 0.569, 0.559, 0.55, 0.54, 0.538, 0.536, 
0.534, 0.532, 0.53, 0.528, 0.526, 0.524, 0.521)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -100L)) -> water_data

library(tidyverse)

# First, enumerate all the contiguous periods without precipitation
water_data |> 
  # flag when it rained
  mutate(rained = Precipitation > 0) |> 
  # apply 'cryptid visit' approach to numbering dry periods
  mutate(dry_period = if_else(!rained, cumsum(rained), NA)) |> 
  mutate(dry_period = factor(dry_period) |> as.numeric()) -> dry_periods

dry_periods |> head()
#> # A tibble: 6 × 6
#>   Date       Time    Precipitation Discharge rained dry_period
#>   <chr>      <chr>           <dbl>     <dbl> <lgl>       <dbl>
#> 1 2010-05-01 0:00:00           0       0.299 FALSE           1
#> 2 2010-05-01 0:15:00           0       0.302 FALSE           1
#> 3 2010-05-01 0:30:00           0       0.305 FALSE           1
#> 4 2010-05-01 0:45:00           0.2     0.308 TRUE           NA
#> 5 2010-05-01 1:00:00           0.2     0.312 TRUE           NA
#> 6 2010-05-01 1:15:00           0       0.317 FALSE           2

# Next, identify all the dry periods that lasted at least four hours
dry_periods |> 
  # add column that enumerates the observations, place it before 'Date' column
  mutate(observation = row_number(), .before = Date) |> 
  # find when each dry period begins and ends
  group_by(dry_period) |> 
  mutate(
    # set start value to earliest index for dry periods, NA for rainy ones
    period_start = if_else(!rained, min(observation), NA),
    # similarly for end value
    period_end = if_else(!rained, max(observation), NA),
    ) |> # view()
  # we no longer need to operate period-wise, so can ungroup table
  ungroup() |> 
  # calculate length of periods (in 15-minute intervals)
  mutate(period_length = period_end - period_start + 1) |> # view()
  # flag when period is at least four hours
  mutate(in_long_period = period_length >= 16) |>
  # flag 'dry spells' of four hours or more
  mutate(in_dry_spell = in_long_period & !rained) |> 
  # remove any columns that are no longer needed
  select(!contains(c('period'))) ->  dry_spells

dry_spells |> head()
#> # A tibble: 6 × 7
#>   observation Date       Time    Precipitation Discharge rained in_dry_spell
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>  <lgl>       
#> 1           1 2010-05-01 0:00:00           0       0.299 FALSE  FALSE       
#> 2           2 2010-05-01 0:15:00           0       0.302 FALSE  FALSE       
#> 3           3 2010-05-01 0:30:00           0       0.305 FALSE  FALSE       
#> 4           4 2010-05-01 0:45:00           0.2     0.308 TRUE   FALSE       
#> 5           5 2010-05-01 1:00:00           0.2     0.312 TRUE   FALSE       
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE  FALSE

# Now, notice that rain events are exactly the periods that fill the gaps
# before or after the dry spells, with one exception: if the data starts with a
# dry period that isn't long enough to be a dry spell, it should also be considered
# a dry spell so that it's not included as part of the following rain event.

dry_spells |> 
  # any observations that occur before any precipitation has been recorded
  # should be flagged as part of a dry spell 
  mutate(in_dry_spell = in_dry_spell | cumsum(rained) == 0) |> 
  # remove any columns that are no longer needed
  select(!contains(c('rain'))) ->  dry_spells

dry_spells |> head()
#> # A tibble: 6 × 6
#>   observation Date       Time    Precipitation Discharge in_dry_spell
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>       
#> 1           1 2010-05-01 0:00:00           0       0.299 TRUE        
#> 2           2 2010-05-01 0:15:00           0       0.302 TRUE        
#> 3           3 2010-05-01 0:30:00           0       0.305 TRUE        
#> 4           4 2010-05-01 0:45:00           0.2     0.308 FALSE       
#> 5           5 2010-05-01 1:00:00           0.2     0.312 FALSE       
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE

# Finally, since an observation is part of a rain event exactly when it isn't
# part dry spell, we can apply 'cryptid visit' approach to numbering rain
# events, too
dry_spells |> 
  mutate(rain_event = if_else(!in_dry_spell, cumsum(in_dry_spell), NA)) |> 
  mutate(rain_event = factor(rain_event) |> as.numeric()) -> rain_events

rain_events |> head()
#> # A tibble: 6 × 7
#>   observation Date       Time    Precipitation Discharge in_dry_spell rain_event
#>         <int> <chr>      <chr>           <dbl>     <dbl> <lgl>             <dbl>
#> 1           1 2010-05-01 0:00:00           0       0.299 TRUE                 NA
#> 2           2 2010-05-01 0:15:00           0       0.302 TRUE                 NA
#> 3           3 2010-05-01 0:30:00           0       0.305 TRUE                 NA
#> 4           4 2010-05-01 0:45:00           0.2     0.308 FALSE                 1
#> 5           5 2010-05-01 1:00:00           0.2     0.312 FALSE                 1
#> 6           6 2010-05-01 1:15:00           0       0.317 FALSE                 1

Created on 2024-04-15 with reprex v2.0.2

Thank you so much, I have been trying to figure this out for so long now.

1 Like