I'm using RSelenium to scrape some Javascript-rendered content. A bottleneck appeared -- on every few pages (total of 65k, recurring task) around 3-5 seconds is wasted performing TLS handshakes and loading ads.
A small showcase of the problem
I'm after the list of moves in the white box. The extra wait seems pointless.
library(RSelenium)
# a sample url. There are roughly ~65k of these to process
url <- "http://www.gdchess.com/xqgame/gview.asp?id=0442450F8C81CB"
rD <- RSelenium::rsDriver(browser = "firefox", check = F)
client <- rD$client
client$navigate(url)
target <- client$findElement("id", "movetext")
target$getElementText()[[1]] # What I'm after
# ~ some further processing here
The website is not accessible from outside of China, I think. I added the gif to show the load time problem, but do recognize the reproducibility problem. The delay would differ since the load times would differ for people. In my case, it is the connection to googlesyndication that takes multiple seconds every few pages. If you have some suggestions as to what might work, I'm all ears (and hands to implement)!
When starting the server, the $pageLoadStrategy remains "normal". this github response to a user with the same question sadly seems outdated, as such slot is no longer in the client.
I can't seem to find proper documentation on how to manipulate the makeFirefoxProfile in R. Anybody have any tips?
A crude workaround solution was found. On starting Selenium, one can navigate to the extension tab and manually download and add an adblocker to the browser. To do this, navigate to top right, click "Settings", click "Add-ons and themes", search for "adblock" in the search bar there, and click the "add" button.
The solution to this question is thus how to properly add an adblock extension.