Dynamic Time Wrapping for time series with more than 1 variable of interest

denisepatrick · January 8, 2021, 4:17pm

Hi there,

I have been exploring 'dtw' package to calculate the distance between multiple time series.

This stackflow post gives a wonderful working example of how to use dtw package for univariate time series data:

Here they have a dataframe that looks as this:

#data: 8 observations, 3 cars 
file.ID2 <- c("Cars_03", "Cars_03", "Cars_03", 
              "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_04", 
              "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", 
              "Cars_04", "Cars_05", "Cars_05", "Cars_05", "Cars_05", "Cars_05", 
              "Cars_05", "Cars_05", "Cars_05")
speed.kph.ED <- c(129.3802848, 
                  129.4022304, 129.424176, 129.4461216, 129.4680672, 129.47904, 
                  129.5009856, 129.5229312, 127.8770112, 127.8221472, 127.7672832, 
                  127.7124192, 127.6575552, 127.6026912, 127.5478272, 127.4929632, 
                  134.1095616, 134.1205344, 134.1315072, 134.1534528, 134.1644256, 
                  134.1753984, 134.1863712, 134.197344)

df <- data.frame(file.ID2, speed.kph.ED)
df

Here we have 3 univariate time series (one for each type of car) that has 1 variable of interest: speed.kph.ED

Per suggested by the accepted answer, here is the procedure to calculate distance between 3 cars using dtw:

library(dtw)
library(purrr)
library(dplyr)

# Split your data frame into a list by file.ID2
ds <- split(df, df$file.ID2)
ds

# Use expand.grid to make all combinations of your names, file.ID2 and your values
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

# purrr:map_dbl iterates through all row-combinations of Values and returns a vector of doubles
Dist <- map_dbl(1:nrow(Values), ~dtw(x = Values[.x,]$Var1[[1]]$speed.kph.ED, y = Values[.x,]$Var2[[1]]$speed.kph.ED)$distance)

# Bind answer to Names
library(dplyr)
ans <- Names %>% 
  mutate(distance = Dist)

ans

Now, I am wondering what if I have other two variables in this dataframe that I also want to take into consideration?

Let's say I created these 2 extra variables score.kph.ED and rating.kph.ED :

score.kph.ED <- c(1:24)
rating.kph.ED <- c(25:48)

df <- data.frame(file.ID2, speed.kph.ED, score.kph.ED, rating.kph.ED)
df

That is, now, the distance between the 3 cars are calculated not only based on speed.kph.ED, but also based on score.kph.ED and rating.kph.ED.

How can I modify the existing code so that I can achieve this goal?

Thanks so much for your help!

system · January 29, 2021, 4:17pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.