# Identify data with identical values and obtain corresponding time period

Hi everyone,
R noob here. I have a quick question: I have a huuuuuge dataset (30000 entries or so) of speeds (it's about ships) and dates/times at which these were recorded. Looks like this:
Speed Date Time
0 20191225 195752
5 20191225 185828
0 20191225 175816
0 20191225 165834
0 20191225 155852
0 20191225 145749
0 20191225 135733
5 20191225 125738
10 20191225 115808
... etc

What I'm trying to do is to identify when a ship has been immobile (speed=0) for more than 3 consecutive hours, and then remove this data. My thought process was: merge the date and time into a single column ymd_hms style. Then, create a column 'time difference between the previous and next column'... and then somehow let R identify when speed has been equal to zero over more than 3 consecutive hours, and then remove those (but even just identifying them would already be a big step forward).
So I found out how to convert date and time into a single 'datetime' column, which is nice. But now I'm completely at a loss as to what to do next, and seeing as this is a very specific example I'm struggling to find similar topic online.
I would be extremely grateful if someone could just point out some useful functions that could help me solve my problem?

I hope that you have oversimplified in your explanation because..
if the data concerns multiple ships, and there is no identifier in the measurements for which ship a measurement applies to, interpreting the behaviour of any one ship would seem to be an impossible task?
Are there in fact , ship ID's that you have omitted ?

Hi,
So my data concerns only a single ship, no separation per ship needed. Plus, the whole list is "in sequence", so I would really just need a script that detects when speed=0 over more than 3 consecutive hours.. does this still sound impossible?

You could do it in the following way. As you say that you are an R noob, you will have to study on it:

``````library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#>     date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)
library(tibble)

df1 <- tibble::tribble(
~Speed, ~Date, ~Time,
0, 20191225, 195752,
5, 20191225, 185828,
0, 20191225, 175816,
0, 20191225, 165834,
0, 20191225, 155852,
0, 20191225, 145749,
0, 20191225, 135733,
5, 20191225, 125738,
10, 20191225, 115808
)

sapply(df1,class)
#>     Speed      Date      Time
#> "numeric" "numeric" "numeric"

myfun <- function(data)  {
x <- data %>%
filter(Speed == 0)
if (nrow(x) == 0 ) return(data)
x0 <- x\$date_time[[1]]
data %>%
filter( (Speed != 0) | ((date_time - !!x0) / dhours(3) < 1) )
}

df2 <- df1 %>%
mutate (date_time = parse_date_time(paste(Date,Time),"ymd HMS")) %>%
select(-c(Date,Time)) %>%
arrange(date_time) %>%
print()
#> # A tibble: 9 x 2
#>   Speed date_time
#>   <dbl> <dttm>
#> 1    10 2019-12-25 11:58:08
#> 2     5 2019-12-25 12:57:38
#> 3     0 2019-12-25 13:57:33
#> 4     0 2019-12-25 14:57:49
#> 5     0 2019-12-25 15:58:52
#> 6     0 2019-12-25 16:58:34
#> 7     0 2019-12-25 17:58:16
#> 8     5 2019-12-25 18:58:28
#> 9     0 2019-12-25 19:57:52
df2 <- df2 %>%
mutate (blknr = cumsum(ifelse(Speed == 0, 0, 1) ) ) %>%
nest_by(blknr) %>%
mutate(
x = list(myfun(data))
) %>%
ungroup() %>%
select(-c(blknr,data)) %>%
tidyr::unnest(x) %>%
print()
#> # A tibble: 7 x 2
#>   Speed date_time
#>   <dbl> <dttm>
#> 1    10 2019-12-25 11:58:08
#> 2     5 2019-12-25 12:57:38
#> 3     0 2019-12-25 13:57:33
#> 4     0 2019-12-25 14:57:49
#> 5     0 2019-12-25 15:58:52
#> 6     5 2019-12-25 18:58:28
#> 7     0 2019-12-25 19:57:52
Created on 2021-12-08 by the reprex package (v2.0.1)
``````

Hi,

Thank you so much for your help, it worked on my data as well !! I really need to familiarize myself more with the functions in there.. but this is already a huge help !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.