so ineed to use Haversine function from packge "pracma" , the problome is that i dont know how to use the function on big data , lets say i have 1000 values of latitude and longatuides how do i make the function calculatin the distance between two points when every points?
For every combination of a bunch of values, you might take a look at expand.grid()
.
From the haversine()
docs, it looks like:
The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).
You could then calculate haversine()
for all combinations kept in a data frame, for example:
df <- tibble::tibble(x = '41 58 43N, 87 54 17W', y = '50 02 00N, 08 34 14E')
pracma::haversine(df$x, df$y)
#> [1] 6971.059
Created on 2019-03-18 by the reprex package (v0.2.1)
ok ty , but im not sure im quit understand how to calculate for 1000 datas sets?
for exmple if ihave the data "quaks" each earthquaks have its own lattitude and longatuide how do i use this function ? im suppost to creat anew data frame or something ?
No, probably a new variable. Check out dplyr::mutate()
idid try to add with mutate new colunm that will calculate but the HARVESTIN said i put worng cordinat format , can you show me example ? i mean how do i put it in mutate
Using a small subset of your data, could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.
install.packages("reprex")
If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.
There's also a nice FAQ on how to do a minimal reprex for beginners, below:
What to do if you run into clipboard problems
If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.
reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")
For pointers specific to the community site, check out the reprex FAQ.
here is my reprex :
attach(quakes)
library(pracma)
#> Warning: package 'pracma' was built under R version 3.5.3
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x=mutate(quakes,dist=haversine(quakes$lat,quakes$long))
#> Error in mutate_impl(.data, dots): Evaluation error: Coordinate input not in correct format..
Created on 2019-03-19 by the reprex package (v0.2.1)
If you look at the documentation for the haversine()
function (linked above, and can also be seen by running ?haversine
when pracma is attached, you'll see that the first two arguments, loc1
and loc2
each take a specific format.
Haversine Formula
Haversine formula to calculate the arc distance between two points on earth (i.e., along a great circle).
Usage
haversine(loc1, loc2, R = 6371.0)
Arguments
loc1, loc2
Locations on earth; for format see Details.
RAverage earth radius R = 6371 km, can be changed on input.
Details
The Haversine formula is more robust for the calculating the distance as with the spherical cosine formula. The user may want to assume a slightly different earth radius, so this can be provided as input.
The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).
Here for latitude 'N' and 'S' stand for North and South, and for longitude 'E' or 'W' stand for East and West. For the degrees format, South and West must be negative.
These two formats can be mixed.
The data you have given it is not in that format. You're giving it latitude for loc1
and longitude for loc2
.
The two possible formats are shown in the example for haversine()
:
library(pracma)
FRA = '50 02 00N, 08 34 14E' # Frankfurt Airport
ORD = '41 58 43N, 87 54 17W' # Chicago O'Hare Interntl. Airport
fra <- c(50+2/60, 8+34/60+14/3600)
ord <- c(41+58/60+43/3600, -(87+54/60+17/3600))
dis <- haversine(FRA, ORD) # 6971.059 km
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.
dis <- haversine(fra, ord)
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.
Created on 2019-03-19 by the reprex package (v0.2.1)
I think haversine()
is not vectorized, (I hadn't used the pracma package until responding to you here), so you might need to use purrr, rather than just mutate()
. The Stack Overflow thread below should help, they're using a different package, but the same approach should work:
yeha the data is not at the format of loc rather each long and lat of there own uniqe colum , i tried to unit them in to one colum with no success
Do you have to use pracma
implementation to get the distance matrix? Otherwise, the following seems OK to me. Actually, purrr
is not required.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(geosphere) # contains the distm function
locations_tidyverse <-
quakes %>%
head() %>%
select(long, lat) %>% # distm requires longitude first, before latitude
mutate(long = if_else(condition = (long > 180),
true = (long - 360),
false = long)) # distm use -180:180 longitude format
locations_baseR <- within(data = head(quakes)[2:1],
expr = {
long <- ifelse(test = (long > 180),
yes = (long - 360),
no = long)
})
distm(x = locations_tidyverse,
fun = distHaversine)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.00 65416.35 670925.4 272765.2 35470.22 293108.9
#> [2,] 65416.35 0.00 676068.9 302329.8 99481.44 358385.5
#> [3,] 670925.45 676068.89 0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5 0.0 274549.94 337917.6
#> [5,] 35470.22 99481.44 658574.0 274549.9 0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18 0.0
distm(x = locations_baseR,
fun = distHaversine)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.00 65416.35 670925.4 272765.2 35470.22 293108.9
#> [2,] 65416.35 0.00 676068.9 302329.8 99481.44 358385.5
#> [3,] 670925.45 676068.89 0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5 0.0 274549.94 337917.6
#> [5,] 35470.22 99481.44 658574.0 274549.9 0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18 0.0
i dont have to use the pcarma it was just a reccomendation that iget from the asseigment , i can use other methods if possible , i want to try to do what have you done but what does %>% mean? sorry im noob
This is called pipe
, implemented in the magrittr
package.
If you want to do f(g(x,y),z)
, then in this notation, it becomes x %>% g(y) %>% f(z)
.
You can find more about this here:
I edited my previous post to add an equivalent approach, with which I expect you to be familiar.
ty i get it now , but i think i need to use pracma becuse ineed to add the calculation as new colum in the quakes data and i cant do it with the method use used
locations <-
quakes %>%
head() %>%
select(long, lat) %>%
mutate(long = if_else(condition = (long > 180),
true = (long - 360),
false = long))
#> Error in quakes %>% head() %>% select(long, lat) %>% mutate(long = if_else(condition = (long > : could not find function "%>%"
distm(x = locations,
fun = distHaversine)
#> Error in distm(x = locations, fun = distHaversine): could not find function "distm"
quakes_distance=mutate(quakes,distance=distm(x = locations,
fun = distHaversine))
#> Error in mutate(quakes, distance = distm(x = locations, fun = distHaversine)): could not find function "mutate"
Created on 2019-03-19 by the reprex package (v0.2.1)
You need to include the libraries you're using inside your reprex.
Along with what Mara said, you can actually add a new column to the existing dataset even with this approach:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(geosphere)
library(tidyr)
locations <-
quakes %>%
head() %>%
select(long, lat) %>%
mutate(long = if_else(condition = (long > 180),
true = (long - 360),
false = long)) %>%
crossing(., .) %>%
rowwise() %>%
mutate(distances = distm(x = c(long, lat),
y = c(long1, lat1),
fun = distHaversine))
as.data.frame(x = locations)
#> long lat long1 lat1 distances
#> 1 -178.38 -20.42 -178.38 -20.42 0.00
#> 2 -178.38 -20.42 -178.97 -20.62 65416.35
#> 3 -178.38 -20.42 -175.90 -26.00 670925.45
#> 4 -178.38 -20.42 -178.34 -17.97 272765.16
#> 5 -178.38 -20.42 -178.04 -20.42 35470.22
#> 6 -178.38 -20.42 -175.69 -19.68 293108.93
#> 7 -178.97 -20.62 -178.38 -20.42 65416.35
#> 8 -178.97 -20.62 -178.97 -20.62 0.00
#> 9 -178.97 -20.62 -175.90 -26.00 676068.89
#> 10 -178.97 -20.62 -178.34 -17.97 302329.84
#> 11 -178.97 -20.62 -178.04 -20.42 99481.44
#> 12 -178.97 -20.62 -175.69 -19.68 358385.48
#> 13 -175.90 -26.00 -178.38 -20.42 670925.45
#> 14 -175.90 -26.00 -178.97 -20.62 676068.89
#> 15 -175.90 -26.00 -175.90 -26.00 0.00
#> 16 -175.90 -26.00 -178.34 -17.97 928617.55
#> 17 -175.90 -26.00 -178.04 -20.42 658574.01
#> 18 -175.90 -26.00 -175.69 -19.68 703868.46
#> 19 -178.34 -17.97 -178.38 -20.42 272765.16
#> 20 -178.34 -17.97 -178.97 -20.62 302329.84
#> 21 -178.34 -17.97 -175.90 -26.00 928617.55
#> 22 -178.34 -17.97 -178.34 -17.97 0.00
#> 23 -178.34 -17.97 -178.04 -20.42 274549.94
#> 24 -178.34 -17.97 -175.69 -19.68 337917.64
#> 25 -178.04 -20.42 -178.38 -20.42 35470.22
#> 26 -178.04 -20.42 -178.97 -20.62 99481.44
#> 27 -178.04 -20.42 -175.90 -26.00 658574.01
#> 28 -178.04 -20.42 -178.34 -17.97 274549.94
#> 29 -178.04 -20.42 -178.04 -20.42 0.00
#> 30 -178.04 -20.42 -175.69 -19.68 259181.18
#> 31 -175.69 -19.68 -178.38 -20.42 293108.93
#> 32 -175.69 -19.68 -178.97 -20.62 358385.48
#> 33 -175.69 -19.68 -175.90 -26.00 703868.46
#> 34 -175.69 -19.68 -178.34 -17.97 337917.64
#> 35 -175.69 -19.68 -178.04 -20.42 259181.18
#> 36 -175.69 -19.68 -175.69 -19.68 0.00
The only problem with the above (as far as I can see yet), it contains some useless rows. Since distance is symmetric, \binom{n}{2} rows are expected, but here we have n^2 rows. Certainly, calculating distances between same locations is not required, and neither are the rows calculating distances between B and A, if we already calculated that between A and B.
But I can't provide a solution to that. I'm sure it's something very trivial, but I fail to spot it. I'm sure someone else will provide a better solution taking care of this.
Most probably, you've already figured out how to avoid the extra rows by now. But to make this thread complete (and for my own satisfaction), I'm posting a solution based on the following post:
Here's the solution:
# loading libraries
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(geosphere) # contains the distHavershine function
all_locations <-
quakes %>% # original dataset
head() %>% # taking a subset for illustration (not needed, obviously!!)
select(long, lat) %>% # taking only the coordinates
mutate(long = if_else(condition = (long > 180),
true = (long - 360),
false = long)) # converting 0:360 system to -180:180 system
as.data.frame(x = all_locations)
#> long lat
#> 1 -178.38 -20.42
#> 2 -178.97 -20.62
#> 3 -175.90 -26.00
#> 4 -178.34 -17.97
#> 5 -178.04 -20.42
#> 6 -175.69 -19.68
all_location_combinations <-
as_tibble(x = t(x = apply(X = combn(x = (1:nrow(all_locations)),
m = 2), # creating row combinations
MARGIN = 2, # performing for each combination
FUN = function(u) {
unlist(x = all_locations[u, ]) # de-stacking rows
}))) %>%
arrange(long1, lat1, long2, lat2) # changing order (not needed)
as.data.frame(x = all_location_combinations)
#> long1 long2 lat1 lat2
#> 1 -178.97 -178.34 -20.62 -17.97
#> 2 -178.97 -178.04 -20.62 -20.42
#> 3 -178.97 -175.90 -20.62 -26.00
#> 4 -178.97 -175.69 -20.62 -19.68
#> 5 -178.38 -178.97 -20.42 -20.62
#> 6 -178.38 -178.34 -20.42 -17.97
#> 7 -178.38 -178.04 -20.42 -20.42
#> 8 -178.38 -175.90 -20.42 -26.00
#> 9 -178.38 -175.69 -20.42 -19.68
#> 10 -178.34 -178.04 -17.97 -20.42
#> 11 -178.34 -175.69 -17.97 -19.68
#> 12 -178.04 -175.69 -20.42 -19.68
#> 13 -175.90 -178.34 -26.00 -17.97
#> 14 -175.90 -178.04 -26.00 -20.42
#> 15 -175.90 -175.69 -26.00 -19.68
distances_between_combinations <-
all_location_combinations %>%
rowwise() %>% # performing for each row
mutate(distances = distm(x = c(long1, lat1),
y = c(long2, lat2),
fun = distHaversine)) # finding distances
as.data.frame(x = distances_between_combinations)
#> long1 long2 lat1 lat2 distances
#> 1 -178.97 -178.34 -20.62 -17.97 302329.84
#> 2 -178.97 -178.04 -20.62 -20.42 99481.44
#> 3 -178.97 -175.90 -20.62 -26.00 676068.89
#> 4 -178.97 -175.69 -20.62 -19.68 358385.48
#> 5 -178.38 -178.97 -20.42 -20.62 65416.35
#> 6 -178.38 -178.34 -20.42 -17.97 272765.16
#> 7 -178.38 -178.04 -20.42 -20.42 35470.22
#> 8 -178.38 -175.90 -20.42 -26.00 670925.45
#> 9 -178.38 -175.69 -20.42 -19.68 293108.93
#> 10 -178.34 -178.04 -17.97 -20.42 274549.94
#> 11 -178.34 -175.69 -17.97 -19.68 337917.64
#> 12 -178.04 -175.69 -20.42 -19.68 259181.18
#> 13 -175.90 -178.34 -26.00 -17.97 928617.55
#> 14 -175.90 -178.04 -26.00 -20.42 658574.01
#> 15 -175.90 -175.69 -26.00 -19.68 703868.46
Created on 2019-03-22 by the reprex package (v0.2.1)
wow ty , until now icoudnt get it right and you solve it
although idont know how to add the new colun distance to orignal quakes data i dont think that possiable
Obviously, it's possible. You can just cbind
the rest of the columns.
I misunderstood. This can be done if you make 10 + 1 columns in total. But that's unnecessary in my opinion.
i did try c bind but the prbolome is that my orignal data quaks contain 1000 object and ofcoruse the distance contain higer objerts