I am currently in the process of writing code to download multiple pdf files from http://www.understandingwar.org/report/afghanistan-order-battle that detail information on the U.S. war in Afghanistan. The code below works when all of the links generated with glue exist on the website but breaks when they are missing.
The following code successfully downloads a single pdf file as expected.
#---- Loads Packages
library("pdftools")
library("glue")
library("tidyverse")
#---- Creates a List of All of the ORBAT PDF URLs
month <- c("January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December")
year <- c("2013", "2014", "2015", "2016", "2017")
# Creates a String of the URL Addresses
urls <-
tidyr::expand_grid(month, year) %>%
filter(month == "October" & year == "2013") %>%
glue_data("http://www.understandingwar.org/sites/default/files/AfghanistanOrbat_{month}{year}.pdf")
head(urls, 5)
# Creates Names for the PDF Files
pdf_names <-
tidyr::expand_grid(month, year) %>%
filter(month == "October" & year == "2013") %>%
glue_data("orbat-report-{month}-{year}.pdf")
head(pdf_names, 5)
#---- Downloads the PDF Files Using purrr
walk2(urls, pdf_names, download.file, mode = "wb")
The problem is that several of the links are broken. When I try to download all of the files in the list of URL addresses generated using glue_data
the code fails. Does anyone have ideas for how to skip the broken links and to download the links that do exist/work while using walk2
?