I'm using RSelenium
to scrape some Javascript-rendered content. A bottleneck appeared -- on every few pages (total of 65k, recurring task) around 3-5 seconds is wasted performing TLS handshakes and loading ads.
A small showcase of the problem
I'm after the list of moves in the white box. The extra wait seems pointless.
library(RSelenium)
# a sample url. There are roughly ~65k of these to process
url <- "http://www.gdchess.com/xqgame/gview.asp?id=0442450F8C81CB"
rD <- RSelenium::rsDriver(browser = "firefox", check = F)
client <- rD$client
client$navigate(url)
target <- client$findElement("id", "movetext")
target$getElementText()[[1]] # What I'm after
# ~ some further processing here
What I have tried so far:
- Using firefox profile with an adblocker.
fprof <- getFirefoxProfile(profDir = "C:\\Users\\D\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\tt1r5yc1.Xiangqi.selenium", useBase = F)
I get the error
the package Rcompression
is not available for this version of R.
- Setting page load timeout
client$setTimeout(type = "page load", milliseconds = 500)
But I'm afraid I misunderstood its use. Explicit and implicit waits also didn't help.