> head(df)
yr mo day time sal site
2021 8 1 0000 26.614 14
2021 8 1 0015 25.724 14
2021 8 1 0030 25.739 14
2021 8 1 0045 25.831 14
2021 8 1 0100 25.798 14
2021 8 1 0115 25.667 14
The dataframe has +800k rows of continuously measured data in 15 min. increments for the past couple of years at "n" sites.
I'm attempting to make a matrix/table like this in R (that was originally created in Excel). Color is not important, I only hope to re-create the table itself. The values in this plot are the maximum/highest number of days (converted from consecutive minutes) where a range of values (sal) were consistently measured (ex. For each site, season, and year, create a table that shows the longest lasting run of continuous time where the values stayed between, for example, 40-50).
This is code I found somewhat useful, but it doesn't account for the fact that consecutive values need to be by an unbroken chain of the time.
I hope the function in the snippet below will get you started. I invented some data to work with. I did not invent different sites or seasons because I was too lazy. I your real data, you will want to group_by() all the appropriate variables.
The first thing I do is construct a datetime column so that time differences can be calculated.
What the function does is filter for the desired sal values and then calculate the time between neighboring rows. If that time is 15, then the rows are part of a continuous run. The rle function looks at the values in the Diff column and characterizes the run length of all the values. We are interested in runs of the value 15 and my code finds the longest such run.
library(dplyr)
library(lubridate)
#Invent data. Don't worry about the details here
Sequence <- seq.POSIXt(from = ymd_hm("2022-01-01 00:00"),
to = ymd_hm("2022-02-28 00:00"), by = "15 min")
DATA <- data.frame(year = 2022, month = month(Sequence), day = day(Sequence),
time = paste0(formatC(hour(Sequence),width = 2, flag = "0"),
formatC(minute(Sequence), width = 2, flag = "0")),
sal = rnorm(5569,40,5))
#Construct a POSIX datetime from the year, month etc.
DATA <- DATA |> mutate(DateTime = make_datetime(year,month,day,
hour=as.numeric(substr(time,1,2)),
min=as.numeric(substr(time,3,4))))
FindRun <- function(DF){
DF40_50 <- DF |> filter(between(sal, 40,50)) |>
mutate(Lag = lag(DateTime),
Diff = as.numeric(DateTime - Lag))
RUNs <- rle(DF40_50$Diff)
Lengths <- RUNs$lengths[RUNs$values==15]
MaxRun <- max(Lengths,na.rm = TRUE)
MaxRun *15/1440
}
FindRun(DATA)
#> [1] 0.07291667