speed up euclidean distance calculation

Hi,
I'm trying to calculate the euclidian distance between 2 vectors in order to create a distance matrix.

This is the code I'm using right now:

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))

The problem that it takes a lot of time because I'm iterating over a lot of rows. is there any way to speed up this calculation?

thank you!

Perhaps just try using the built-in dist function which does Euclidiean distance by default.

Just tried it, it still super slow.

Distance calculations run in polynomial time, it can be slow if a lot of rows. You shouldn't be iterating manually. The function can take in a matrix. Say your input matrix has m rows and n columns, the distance matrix will be m by m.

m <- 100
n <- 5

dat <- matrix(rnorm(m*n), nrow=m)
distout <- as.matrix(dist(dat))
dim(distout)
#> [1] 100 100

Created on 2020-08-03 by the reprex package (v0.3.0)

In my case I have to iterate over rows. do you know anything about the use of "sweep" function with euclidean distance? I understand that it might be faster but I don't know how to implement this.

Can you describe your data more? Perhaps share part of the data. I'm not sure if you're wanting the distance between 2 vectors or 2 matrices.

You can decompose your frame into numeric part and character part.
use caret dummyVars function to one hot encode the character part so that dist() can be applied to it.
Either recombine the dummyvars with the numerics and dist the whole set, or run dist twice and add the results (the latter demo'd below)

library(caret)
library(tidyverse)

a<-list("a","b",1,2,3,2) %>% as.data.frame() %>% set_names(LETTERS[1:6])
b<-list("a","b",3,2,3,4) %>% as.data.frame() %>% set_names(LETTERS[1:6])
c<-list("c","a",3,2,3,4) %>% as.data.frame() %>% set_names(LETTERS[1:6])


all_o <- rbind(a,b,c)

(all_o_cat <- select_if(all_o,is.character))


(dmy_cats_formula <- dummyVars(" ~ .", data = all_o_cat))

(dmy_cats <- data.frame(predict(dmy_cats_formula, newdata = all_o_cat)))

(cat_dists <- dist(dmy_cats) %>% as.numeric())


(all_o_num <- select_if(all_o,is.numeric))
(num_dists <- dist(all_o_num) %>% as.numeric())

(sum_dists <- num_dists+cat_dists)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.