RSelenium navigate long wait

I'm using RSelenium to scrape some Javascript-rendered content. A bottleneck appeared -- on every few pages (total of 65k, recurring task) around 3-5 seconds is wasted performing TLS handshakes and loading ads.

A small showcase of the problem
5jnhgp

I'm after the list of moves in the white box. The extra wait seems pointless.

library(RSelenium)
# a sample url. There are roughly ~65k of these to process
url <- "http://www.gdchess.com/xqgame/gview.asp?id=0442450F8C81CB"
rD <- RSelenium::rsDriver(browser = "firefox", check = F)
client <- rD$client
client$navigate(url)

target <- client$findElement("id", "movetext")
target$getElementText()[[1]]  # What I'm after

# ~ some further processing here

What I have tried so far:

  1. Using firefox profile with an adblocker.
fprof <- getFirefoxProfile(profDir = "C:\\Users\\D\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\tt1r5yc1.Xiangqi.selenium", useBase = F)

I get the error
the package Rcompression is not available for this version of R.

  1. Setting page load timeout
client$setTimeout(type = "page load", milliseconds = 500)

But I'm afraid I misunderstood its use. Explicit and implicit waits also didn't help.

Question: Can RSelenium's navigate be terminated when the desired content is available? If so, how?

For some reason, the URL in your code does not lead to any website.

The website is not accessible from outside of China, I think. I added the gif to show the load time problem, but do recognize the reproducibility problem. The delay would differ since the load times would differ for people. In my case, it is the connection to googlesyndication that takes multiple seconds every few pages. If you have some suggestions as to what might work, I'm all ears (and hands to implement)!

I really would love to help; however, it is fairly difficult if I cannot put my virtual hands on the website :cry:

update

  1. pageLoadStrategy

per Selenium documentation setting page load strategy should help avoid loading things.

fprof <- makeFirefoxProfile(list(
  "webdriver.load.strategy"="eager"
))

rD <- RSelenium::rsDriver(browser = "firefox", check = F, extraCapabilities = fprof)

When starting the server, the $pageLoadStrategy remains "normal". this github response to a user with the same question sadly seems outdated, as such slot is no longer in the client.

I can't seem to find proper documentation on how to manipulate the makeFirefoxProfile in R. Anybody have any tips?

For future readers >>

A crude workaround solution was found. On starting Selenium, one can navigate to the extension tab and manually download and add an adblocker to the browser. To do this, navigate to top right, click "Settings", click "Add-ons and themes", search for "adblock" in the search bar there, and click the "add" button.

The solution to this question is thus how to properly add an adblock extension.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.