TLDR: I have a data-set of links that give me 404 errors, but there's a useful URL in the address bar that comes when I get the 404 error. Can I access that "useful URL" in R?
I'm trying to scrape data from a webpage, but I'm (understandably) getting a 404 error for the URLs below. However, there's data from the 404 link that I'm trying to get from within the browser. Here's the example:
I don't actually want to get a 404 Error, but in the address bar, there's a URL that -- after some manipulation -- I can use to get the actual webpage that I want ("https://www.uscho.com/recaps/?p=171810970")
This URL, however, doesn't show up in R anywhere from what I can tell. Running read_html(link_list[200]), I only get a 404 error.
Any idea how I can get the URL from the browser within R?
FYI I asked this question on stack exchange earlier, but chances are it won't get answered there, and I thought this may be a better place to ask.
Thanks for the response! I'll try that. By the way, are you familiar with splashr at all? I finally figured how to get that working (still haven't quite figured out RSelenium). Do you know of any similar method for getCurrentUrl() in splashr?
Thanks @cderv! I'll look into that as soon as I can. I'm gonna try to see if the RSelenium solution works, but I'm having quite the time trying to get RSelenium set up...
That way you "just" download the image, run the container as a server and connect R to this server on your local machine.
Last time I used it, it was with docker.