Download multiple files using “download.file” function while skipping links that do not exist (with walk2)

Hello I am trying to perform a task similar to this one but instead of downloading csv I want to download tif files.

I do not know what I am doing wrong but I tried with both passying otherwise=NULL to safely and with a print message ... in every case when it comes to files that do not exist (e.g. 29 February wbgtmax.2015.02.29.tif ) I got an error message

Caused by error in `download.file()`:
! cannot open URL 'https://data.chc.ucsb.edu/people/cascade/UHE-daily/wbgtmax/2015/wbgtmax.2015.02.29.tif'
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: In download.file(url = x, destfile = y, mode = "wb") :
  downloaded length 0 != reported length 0
2: In download.file(url = x, destfile = y, mode = "wb") :
  cannot open URL 'https://data.chc.ucsb.edu/people/cascade/UHE-daily/wbgtmax/2015/wbgtmax.2015.02.29.tif': HTTP status was '404 Not Found'

This is my code


library(glue)
library(purrr)
library(dplyr)
getOption('timeout') # Look at the timeout option and increase it 
options(timeout=300)

days=c("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15",
       "16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31")

months=c("01","02","03","04","05","06","07","08","09","10","11","12")


# Creates a String of the URL Addresses
urls <- 
  tidyr::expand_grid(months, days) |> 
  glue_data("https://data.chc.ucsb.edu/people/cascade/UHE-daily/wbgtmax/2015/wbgtmax.2015.{months}.{days}.tif")

head(urls, 5)  

# Creates Names for the Files 
tif_names <- 
  tidyr::expand_grid(months, days) %>%
  glue_data("wbgtmax.2015.{months}.{days}.tif")


setwd("C:/Users/angel/Documents/wbgtmax/2015")

safe_download=function(x,y){
  safely(download.file(url=x,destfile = y,mode = "wb"),
         otherwise = print(glue("{x} file not found")))}


walk2(urls,tif_names,\(x,y)safe_download(x=x,y=y))

Can anyone help me understand what I am doing wrong? I have already used safely in the past but never come to such issue

Thanks

Hi @angela_italy
Your code couldn't find a file because 29-Feb-2015 did not exist (2015 was not a leap year).

@DavoWW I know that .. Indeed I would like to make safely skip the files that do not exist

Hi @angela_italy
This code may do the job. I tested it on some of the URLs you supplied but none were available.

# Subset of URLs for Testing
length(urls)
urls <- urls[1:5]

# See: https://stackoverflow.com/questions/60318926/how-to-check-if-file-exists-in-the-url-before-use-download-file-in-r

# Helper function
url_exists <- function(url){
  HTTP_STATUS_OK <- 200
  hd <- httr::HEAD(url)
  status <- hd$all_headers[[1]]$status
  result <- list(exists = status == HTTP_STATUS_OK, status = status)
  return(result)
}

for (ii in 1:length(urls)) {
  inurl <- urls[ii]
  url_exists(inurl)
  Sys.sleep(2)  # slight delay to manage loop speed
  if (result$exists[1] == TRUE & result$status[1] == 200L) {
    destfile <- basename(inurl)
    download.file(url=inurl, destfile = destfile, mode = "wb")
    print(paste(destfile, ": download complete")) } else {
      print(paste(basename(inurl), ": file not found. Continuing to next."))
    }
}

Thank you @DavoWW
So my understanding is that the problem is not the safely function but the download function and should be prevented before running it (e.g. checking that the file exists)

replace the above with


 
safe_download <- 
  purrr::safely(.f =function(x,y){download.file(url=x,destfile = y,mode = "wb")},
                quiet = FALSE)

and though it wouldnt print exactly your {x} file not found message, it would work otherwise as intended.

Thanks @nirgrahamuk. So, the problem was that I specified 'safely' within the function rather than outside it... wasn't it?

I would phrase it that safely is a wrapper for altering a functions behaviour so it should receive a function definition. Your formulation was in some way wrapping around it, and I think the download.file() would be evaluated with x and y, and the result passed to purrr::safely , the result not being a function but its evaluation.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.