Downloading multiple images from a data frame

jxhnathan · December 3, 2022, 7:18am

Hi everyone, I'm having some issues with my code and was wondering if anyone had any ideas on how to fix it?

Currently, what I have done is to scrape multiple direct URL image links into a data frame and subsequently using R to download the images to a temporary folder.

What I would like to achieve is for R to go through every row in my data frame and download the respective image.

However, for some reason R seems to stop at a certain number which I would like to fix

Here's the dataframe of URLs I have created:

library("rvest")
library("ralger")

male <- vector("list", num_pages)

# saving the urls from istockphoto
for(page_result in 1:num_pages){
  link = paste0("https://www.istockphoto.com/search/2/image?alloweduse=availableforalluses&mediatype=photography&phrase=man&page=", 
                page_result)
  male[[page_result]] <- images_preview(link)
}

male <- unlist(male)
male <- as.data.frame(male) # make it a data frame

# adding IDs to dataset
data <- tibble::rowid_to_column(male, "ID")
summary(data)

# drop first row
test <- data[-1,]

Here's the code I used to download the images:

# downloading loop
for (i in 1:304) { 
  myurl <- paste(test[i,2], sep = "")
  a <- tempfile()
  download.file(myurl,a,mode="wb")
  pic <- readJPEG(a)
  writeJPEG(pic, paste("image", "i", ".jpg", sep = ""))
  file.remove(a)
}

scottyd22 · December 3, 2022, 3:13pm

What number did it stop on? Also, what is the value of num_pages in your code?

jxhnathan · December 3, 2022, 3:15pm

I figured out the issue - some of the URL links were not linking to a direct image, which meant that R stopped working once an error was encountered.

I'm looking into implementing a tryCatch function to fix it - though I'm not familiar with this function!

I set num_pages to five (there's a total of 100 pages on the website)

Edit: fixed the issue using a subset function to ignore the problematic URLs!

scottyd22 · December 3, 2022, 4:30pm

That was my suspicion. Glad you sorted it out!

system · January 14, 2023, 4:30pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.