R
is chock full of the little knowledge thing. I find myself chasing a how question when I should have been chasing a what question.
My drill, after too many years wandering around googling, is to look at the help()
for the function that produced my result, in this case, dtw::dtw
. There is found a section that it is tempting to skip over—Values
. This means return values
from a function, what it produces. Here we see
An object of class dtw with the following items:
• distance the minimum global distance computed, not normalized.
• normalizedDistance distance computed, normalized for path length, if normalization is
known for chosen step pattern.
[and other stuff]
The first thing to know that the return values are contained in an object of class dtw
, which means that it may be necessary to examine it with str()
to figure out how to extract the values of interest.
Then look at Details
The function performs Dynamic Time Warp (DTW) and computes the optimal alignment between two time series x and y, given as numeric vectors. The "optimal" alignment minimizes the sum of distances between aligned elements. Lengths of x and y may differ.
The local distance between elements of x (query) and y (reference) can be computed in one of the following ways: [more detail]
OK, that gives some sense. Maybe it's like Euclidian distance? But that's a guess.
Next, is there a paper that introduces the package? Yes, it's in the References
section for the function:
Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw
Package. Journal of Statistical Software, 31(7), 1-24. doi:10.18637/jss.v031.i07
Here
From the Abstract
we learn
Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other.
That seems promising; we now know what distance is supposed to do.
In Definition of the Algorithm
there is
Note that d, the cross-distance matrix between vectors X and Y , is the only input to the DTW algorithm: elements xi and yj only enter the computation through the arguments of f. Therefore, the following discussion applies, with no loss of generality, to cases when X and Y are single- or multi-variate, continuous, nominal, or mixed, as long as f (·, ·) is suitably defined. While the most common choice is to assume the Euclidean distance, different definitions (e.g., those provided by the proxy package, Meyer and Buchta 2009) may be useful as well. (Emphasis added)
So, the guess about Euclidian distance has some support (only some, because there are other distances). Of course if I didn't know what Euclidian distance
meant, I'd have some more schooling to undergo, and at some point I feel confident or sometimes I give up.
That takes care of distance
. What is normalized distance? That follows close on in the type of equation that makes eyes roll to the back of the head:
dφ(X, Y ) = X d(φx(k), φy(k)) mφ(k)/Mφ [cut and pasted so not accurate, but you get the idea] where M_\phi is the corresponding normalization constant.
Okaaaay, so by now we know the drill and it's back to class.