R Scraping with phantomjs Issues

Hi people,

I've been trying to use this guide (https://www.youtube.com/watch?v=GayFRUUtHj4&t=189s) to scrap data from a football betting site (see link in code). I want to get the statistics table for this match. I have tried the previous guide and this is what i have so far:

setwd("/Users/-----/documents/phantomjs/bin")
url <- ("https://www.sofascore.com/south-africa-germany/REjsLgB")
connection <- ("stats_Southafrica_Germany.js")

writeLines(sprintf("var page = required('webpage).create();
page.open('%s', function(){
console.log(page.content); //page source
phantom.exit();
})", url), con=connection)

system_input <- "/Users/------/Documents/phantomjs/bin stats_Southafrica_Germany.js> stats_Southafrica_Germany.html"
system(system_input)
html <- "stats_Southafrica_Germany.html"
pg <- read_html(html)
pg %>%html_nodes(xpath="//*[contains(concat( " ", @class, " " ), concat( " ", "stat-home", " " ))]")%>% html_text()

I think i am not doing the node path thingey (html_nodes(xpath=...) incorrectly, but unsure tbh. to get the xpath= i got a chrome addon which showed xpaths when you selected items on the page, is this were i went wrong? The error is this:

Error: unexpected string constant in "pg %>%html_nodes(xpath="//*[contains(concat( " ", @class, ""

For the final line of code I copied. Can anyone help change my code and help me understand what I did wrong in a clear way?

Thanks

I haven't use xpath in forever.
That said, i think you just have some syntax errors with " inside the actual string.
Try changing all quotes to single quotes ' except the outside ones.

I know this is Rstudio and R, but I never found web scraping in R to be great.
The packages in Python and JS are much easier to use and seemingly more supported.

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

Thanks for the reply, i think you may have got it right with the " causing issues, still isnt working (a nice new error) but gonna keep tweaking until hopefully something happens. If not I will try Python. Cheers