Friends,
I am using hierarchical clustering (hclust function) to generate clusters of my data. But before using in the hclust function, it is necessary to calculate the distance between the Latitude and Longitude coordinates. This distance calculation I considered as variable "d". Before I was using the distm function to calculate this variable, however due to the high computational delay for large data sets, I changed it. Before it looked like this: d <-as.dist (distm (coordinates))
and later I inserted it into my fit.average, and it worked. However, I'm using another function for calculating the distance, as you can see in the code below:
library(geosphere)
library(dplyr)
df <- structure(list(Industries=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19),
Latitude = c(-23.8, -23.8, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9),
Longitude = c(-49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.7,-49.7, -49.7, -49.7, -49.7, -49.6, -49.6, -49.6, -49.6)),
class = "data.frame", row.names = c(NA, -19L))
ordered_df <- df%>%
arrange(., Longitude, Latitude)
coordinates <-
ordered_df %>%
select(Longitude, Latitude) %>%
as.matrix()
d <- data.frame(Dist = c(0, distVincentyEllipsoid(coordinates)))
fit.average<-hclust(d,method="average")
k=3
clusters<-cutree(fit.average, k)
df$cluster <- clusters
However, when running fit.average, it gives an error. Could you help me solve it?
Thank you very much!