Partitioning timeseries data

Hi all. I have a dataset that contains a number of experimental runs. For example runs A, B, and C in the below image. I would like to partition data whenever data is increasing between a y value of 20 and 60. Is there a way to partition this data into multiple dfs?

Original df

time	value
1	10
2	20
3	34
4	45
5	55
6	66
7	77
8	55
9	28
10	12
11	23
12	43
13	56
14	67
15	89
16	54
17	32
18	11
19	18
20	24
21	28
22	37
23	45
24	54
25	59
26	67
27	41
28	32
29	15
30	2

Where A would be

3	34
4	45
5	55

B:

11	23
12	43
13	56

C:

20	24
21	28
22	37
23	45
24	54
25	59

Have a look at lead and lag from dplyr. You could create a second vector in df which is shifted by one. If the value is bigger in the second column than in the first one - TRUE, otherwise - FALSE. Filter TRUEs and you'll have the time index for A, B and C.

Thanks a lot, that's got me a lot closer

df2 <- df %>% 
  mutate(helper_col = value - lag(value)) %>% 
  filter(helper_col >= 0) %>% 
  filter(value > 20 & value < 60)

the above code has got me the data I want but how do I index A, B, C, etc?

This will take you a step closer

library(tidyverse)

# Sample data on a copy/paste friendly format
df <- data.frame(
        time = c(1,2,3,4,5,6,7,8,9,10,11,12,13,
                 14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
                 30),
       value = c(10,20,34,45,55,66,77,55,28,12,23,
                 43,56,67,89,54,32,11,18,24,28,37,45,54,59,67,41,
                 32,15,2)
)

df %>% 
    mutate(helper_col = value - lag(value)) %>% 
    filter(helper_col >= 0) %>% 
    filter(value > 20 & value < 60) %>%
    select(-helper_col) %>% 
    mutate(group = if_else(time - lag(time) > 1 | is.na(lag(time)), row_number(), NA_integer_)) %>% 
    fill(group, .direction = "down") %>% 
    group_split(group)
#> [[1]]
#> # A tibble: 3 x 3
#>    time value group
#>   <dbl> <dbl> <int>
#> 1     3    34     1
#> 2     4    45     1
#> 3     5    55     1
#> 
#> [[2]]
#> # A tibble: 3 x 3
#>    time value group
#>   <dbl> <dbl> <int>
#> 1    11    23     4
#> 2    12    43     4
#> 3    13    56     4
#> 
#> [[3]]
#> # A tibble: 6 x 3
#>    time value group
#>   <dbl> <dbl> <int>
#> 1    20    24     7
#> 2    21    28     7
#> 3    22    37     7
#> 4    23    45     7
#> 5    24    54     7
#> 6    25    59     7
#> 
#> attr(,"ptype")
#> # A tibble: 0 x 3
#> # … with 3 variables: time <dbl>, value <dbl>, group <int>

Created on 2020-03-13 by the reprex package (v0.3.0.9001)

Thanks so much that's exactly what I was looking for :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.