# faster way to find shortest distance with distHaversine

Dear R experts,

I have a set of data (data1) in which every subject has its own location (latitude,longitude) and another data set (data2) in which every store has its location and area. I want to find the store which is most close to each of subject. My script is as bellowe:

library(geosphere)

data1=data.frame(id=c(123,456,789),
latitude=c(23.4567,24.4567,25.4567),
longitude=c(120.4567,120.3567,120.1567))

data2=data.frame(name=c(123,456,789),
area=c('a','b','c'),
latitude=c(23.123,24.456,26.789),
longitude=c(120.3367,120.4567,120.2567))

for (i in 1:nrow(data1)) {
if (!is.na(data1\$latitude[i])) {
data2\$d=NA
data2\$d=distm(data2[,c('longitude','latitude')],
data1[i,c('longitude','latitude')],fun=distHaversine)/1000
}
data1\$d[i]=data2\$d[which.min(data2\$d) ]
data1\$store[i]=data2\$store[which.min(data2\$d)]
data1\$area[i]=data2\$area[which.min(data2\$d)]}

In the end, the most near store, area and distance is attached to data1.

The problem is data1 actually has 600000 rows and data2 has 180 rows and the loop ran like forever to get the result.

Is there any faster way to achieve this?
Any advice will be appreciated.

Best,
Veda

Have a look at st_distance in the sf package. Not need to run it in a loop. You'll probably need to reproject to a geographic coordinate system, but you could transform it back to a projected coordinate system if you need the lat/long.

Perhaps st_nearest_points too.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.