Background
I have a very large dataset, df, where I have Length, Date and Edit Columns. My goal is to iterate through this dataset and find the index, start and end times for a given condition.
Working backwards, I need to get the index or row number where Edit is False *with the condition that the previous Edit is True.This will output an 'End' and the value that is in the Length column.
The Start is generated by going backwards from the 'End' index (Edit is False) and when you come across the next (Edit is False) + 1
Length Date Edit
20 1/2/2020 1:00:00 AM False
21 1/2/2020 1:00:01 AM True
81 1/2/2020 1:00:02 AM True
81 1/2/2020 1:00:03 AM True
90 1/2/2020 1:00:04 AM False
20 1/2/2020 1:00:05 AM True
90 1/2/2020 1:00:06 AM True
81 1/2/2020 1:00:10 AM True
90 1/2/2020 1:00:15 AM False
20 1/2/2020 1:00:25 AM True
Desired output
Start End Duration RowNum Length
1/2/2020 1:00:05 AM 1/2/2020 1:00:15 AM 10 8 90
1/2/2020 1:00:01 AM 1/2/2020 1:00:04 AM 3 4 90
Starting backwards, we see that the first End time is at, 1/2/2020 1:00:15 AM, because Edit is False, and its previous Edit value is True. The length is 90, and the RowNumber is 8. The Start would go backwards from 1/2/2020 1:00:15 AM until we come to another Edit is False line plus 1 , so it would be: 1/2/2020 1:00:05 AM
dput
structure(list(Length = c(20L, 21L, 81L, 81L, 90L, 20L, 90L,
81L, 90L, 20L), Date = structure(1:10, .Label = c("1/2/2020 1:00:00 AM",
"1/2/2020 1:00:01 AM", "1/2/2020 1:00:02 AM", "1/2/2020 1:00:03 AM",
"1/2/2020 1:00:04 AM", "1/2/2020 1:00:05 AM", "1/2/2020 1:00:06 AM",
"1/2/2020 1:00:10 AM", "1/2/2020 1:00:15 AM", "1/2/2020 1:00:25 AM"
), class = "factor"), Edit = c(FALSE, TRUE, TRUE, TRUE, FALSE,
TRUE, TRUE, TRUE, FALSE, TRUE)), class = "data.frame", row.names = c(NA,
-10L))
I have tried
library(dplyr)
library(readr)
for (i in 1:nrow(df) {
if (df[i] == Edit == "False") {
print(df[rows])
}
else if (df[i] < condition) {
print(df[rows])
}
}
mutate(Date = as.POSIXct(Date, format = '%m/%d/%Y %H:%M:%OS')) %>%
mutate(RowNum = cumsum(!cond)) %>%
group_by(Length) %>%
summarize(Start = min(Date),
End = max(Date),
Duration = End - Start) %>%
I am still researching how to put this altogether. any help or suggestions is greatly appreciated.