R only able to handle 500 rows or so before connection error

jillahmad17 · June 19, 2023, 4:23pm

I am using a function I found here, r - Convert Lat/Lon to County Codes using FCC API - Stack Overflow , to convert lat / long points to census tracts.

I have a dataset of 43k rows. It works great for about 500 rows and then it crashes and assigns the remaining 42500 rows to the same tract, which is not correct. Here is the error message I get and the code below is what I am using:

Error in open.connection(con, "rb") :
cannot open the connection to 'https://geo.fcc.gov/api/census/area?lat=35.944191&lon=-81.175762&format=json'

geo2fips <- function(latitude, longitude) { 
  url <- "https://geo.fcc.gov/api/census/area?lat=%f&lon=%f&format=json"
  res <- jsonlite::fromJSON(sprintf(url, latitude, longitude))[["results"]][["block_fips"]]
}

for (i in 1:nrow(tract_data)) {
  tract_data$tracts[i] = geo2fips(tract_data1$Latitude[i], tract_data1$Longitude[i])
}

Any insight would be much appreciated. Thank you

DavoWW · June 20, 2023, 6:24am

Hi @jillahmad17
You are probably overloading the server with too many requests, too quickly. Try adding a short time delay in your loop:

for (i in 1:nrow(tract_data)) {
  tract_data$tracts[i] = geo2fips(tract_data1$Latitude[i], tract_data1$Longitude[i])
Sys.sleep(1)
}

Mind you, with a 1 second delay between requests, 43000 are going to take almost 12 hours! If that doesn't work, try requesting the data in blocks of, say, 400 with a few seconds between each block.

technocrat · June 20, 2023, 10:52am

If the server is rate throttling, it might be possible to get by with a shorter delay than 1 second, but that's still not optimal, given the alternatives.

Both the {tigris} and {tidycensus} packages allow downloading files with country FIPS codes and a {sf} (simple features) representation of the country boundaries. That allows using functions in {sf} such as st_intersects() to write a script to assemble a data frame-like object with the FIPS code, county name and lat/lon geometries.

mikecrobp · June 23, 2023, 9:30am

And along with a subsecond delay, maybe build in a retry loop by using the function try

jillahmad17 · July 10, 2023, 2:01am

Thank you for your insight!! How would the retry loop look?

mikecrobp · July 10, 2023, 7:52am

I don't have an example using fromJSON, but I do when reading from a database which sometimes times out
Doesn't have a delay
the main thing is that you will wrap your fromJSON with try and test the class of the response

  Try = maxtries

  while (TRUE) {
    DataRead = try(dbGetQuery(dbcon,QueryString))
    if (class(DataRead) != "try-error") { break}
    Try = Try - 1
    if (Try <= 0) { break}
    dbcon <<- OpenDataLake() # ready for retry
    LogMessage(QueryString, " read error. retrying")
  }

system · August 21, 2023, 7:53am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.