Analyzing high frequency time series

Lucianoemmert · February 21, 2020, 5:05pm

Dear all,

I have this database in a CSV file. It is a small part of about 6 million records in the complete database. My problem is bigger like her!

      Date time		  X1(mss)	Y1(mss)	Z1(mss)

2020-01-17 23:47:50:3 -0.268 0.603 6.982
2020-01-17 23:47:50:4 -0.213 0.541 7.013
2020-01-17 23:47:50:5 -0.132 0.546 6.953
2020-01-17 23:47:50:6 -0.232 0.460 7.039
2020-01-17 23:47:50:7 -0.148 0.637 6.975
2020-01-17 23:47:50:8 -0.158 0.503 7.025
2020-01-17 23:47:50:9 -0.158 0.608 6.984
2020-01-17 23:47:51:0 -0.196 0.496 7.003
2020-01-17 23:47:51:1 -0.153 0.603 7.023
2020-01-17 23:47:51:2 -0.170 0.558 7.008
2020-01-17 23:47:51:3 -0.182 0.567 7.020
2020-01-17 23:47:51:4 -0.177 0.534 7.023
2020-01-17 23:47:51:5 -0.194 0.555 7.025
2020-01-17 23:47:51:6 -0.177 0.572 6.991
2020-01-17 23:47:51:7 -0.180 0.563 7.023
2020-01-17 23:47:51:8 -0.180 0.536 6.996
2020-01-17 23:47:51:9 -0.194 0.553 7.037
2020-01-17 23:47:52:0 -0.170 0.555 7.001
2020-01-17 23:47:52:1 -0.194 0.546 7.027

In the first column I have a high frequency record containing DATE and TIME when an event (X1, Y1, Z1) was recorded. Note that the TIME was recorded (in my datalogger) with tenths of a second in the format "HH:MM:SS:OS". I need to group these records per second with the average value of each variable X, Y and Z. However, R does not recognize this time format. I have two questions:

1- How does R to recognize this format as a time series?

2- How do I group the tenth of a second into a record every second with the average value of each variable X, Y and Z?

Can someone help me, please?

I,m a beginner in R...

Thank you all!

nirgrahamuk · February 21, 2020, 6:26pm

What I've done might be considered 'ugly', so theres almost certainly a more elegant approach; I leave that as a challenge to the rest of the forum
but I believe this is a start.


library(tidyverse)
library(lubridate)
library(readr)
csv_text <-"date,time,x1_mss,x2_mss,z1_mss
17/01/2020,23:47:50:3,-0.268,0.603,6.982
17/01/2020,23:47:50:4,-0.213,0.541,7.013
17/01/2020,23:47:50:5,-0.132,0.546,6.953
17/01/2020,23:47:50:6,-0.232,0.46,7.039
17/01/2020,23:47:50:7,-0.148,0.637,6.975
17/01/2020,23:47:50:8,-0.158,0.503,7.025
17/01/2020,23:47:50:9,-0.158,0.608,6.984
17/01/2020,23:47:51:0,-0.196,0.496,7.003
17/01/2020,23:47:51:1,-0.153,0.603,7.023
17/01/2020,23:47:51:2,-0.17,0.558,7.008
17/01/2020,23:47:51:3,-0.182,0.567,7.02
17/01/2020,23:47:51:4,-0.177,0.534,7.023
17/01/2020,23:47:51:5,-0.194,0.555,7.025
17/01/2020,23:47:51:6,-0.177,0.572,6.991
17/01/2020,23:47:51:7,-0.18,0.563,7.023
17/01/2020,23:47:51:8,-0.18,0.536,6.996
17/01/2020,23:47:51:9,-0.194,0.553,7.037
17/01/2020,23:47:52:0,-0.17,0.555,7.001
17/01/2020,23:47:52:1,-0.194,0.546,7.027"


my_raw_data <- readr::read_csv(file = csv_text)

my_d2 <- my_raw_data %>%
  mutate(full_dt_string = paste0(date,":",time),
         splitout = stringr::str_split(full_dt_string,
                                        ":")) %>% 
  rowwise() %>%
  mutate(         dmy_hms_strpart = paste0(splitout[[1]]," ",splitout[[2]],":",splitout[[3]],":",splitout[[4]]),
                  dmy_hms = dmy_hms(dmy_hms_strpart),
                  full_date_time = dmy_hms + seconds(parse_integer(splitout[[5]])/10),
                  fdt_round_sec = lubridate::round_date(full_date_time,unit="second"))

my_d3 <- select(my_d2,
                fdt_round_sec,
                x1_mss,
                x2_mss,
                z1_mss) %>%
  group_by(fdt_round_sec) %>%
  summarise_all(~mean(.))
  > my_d3
# A tibble: 3 x 4
  fdt_round_sec       x1_mss x2_mss z1_mss
  <dttm>               <dbl>  <dbl>  <dbl>
1 2020-01-17 23:47:50 -0.240  0.572   7.00
2 2020-01-17 23:47:51 -0.171  0.551   7.01
3 2020-01-17 23:47:52 -0.184  0.554   7.01

Lucianoemmert · February 29, 2020, 3:17pm

Thank you very much!!!

system · March 21, 2020, 3:17pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.