Not for the first time, I'm trying to solve the following problem.
I am trying to loop over rows of a data frame and put the result back into the said data frame. The result for each row is more than 1 new rows, and I'm tripping up on it.
For each city in a data frame, I want to get a list of nearest stations. I'd like to have a row per each city-station pair, or at least have a nested data frame (which I'd later unnest and get the desired pairs anyway).
The output of GSODR::nearest_stations is a vector, and I can't bend it to appear row-wise. Please send help.
Here is a reprex:
library(tidyverse)
library(maps)
library(GSODR)
# Get a sample of cities
cities <- maps::world.cities %>%
filter(country.etc == 'USA') %>%
arrange(desc(pop)) %>%
select(name, lat, long) %>%
head(10)
# Get nearest stations for each city.
# This works, but stations are one long vector per city.
# Expected output for the function, but not what I want
stations_c <- cities %>%
mutate(stations = purrr::map2(lat, long, nearest_stations, distance = 10))
# This method doesn't work.
stations_rows <- cities %>%
mutate(stations = purrr::map2_dfr(lat, long, nearest_stations, distance = 10))
#> Error in mutate_impl(.data, dots): Evaluation error: Argument 1 must have names.
You can unnest() the list of vectors you get in stations_c, which sounds like the kind of output you want (I may have misunderstood what the final version should look like).
The downside to this is that cities without any stations at the given distance, like LA, aren't in the output.
cities %>%
mutate(stations = purrr::map2(lat, long, nearest_stations, distance = 10) ) %>%
unnest()
name lat long stations
1 New York 40.67 -73.94 720553-99999
2 New York 40.67 -73.94 744976-99999
3 New York 40.67 -73.94 997271-99999
4 Chicago 41.84 -87.68 725340-14819
5 Chicago 41.84 -87.68 725346-94866
6 Chicago 41.84 -87.68 725346-99999
7 Chicago 41.84 -87.68 998499-99999
8 Chicago 41.84 -87.68 999999-14819
9 Chicago 41.84 -87.68 999999-94866
10 Houston 29.77 -95.39 720594-00188
...
I see what you did there! I had a similar iteration of leaving the data frame (implicitly or explicitly) and then joining the result back to the original df. But I was sure that there has to be a way to do this without leaving the data frame.
And sure enough, @aosmith's solution is exactly what I was looking for. Brilliant!