R Scraping with phantomjs Issues

Hi people,

I've been trying to use this guide (https://www.youtube.com/watch?v=GayFRUUtHj4&t=189s) to scrap data from a football betting site (see link in code). I want to get the statistics table for this match. I have tried the previous guide and this is what i have so far:

setwd("/Users/-----/documents/phantomjs/bin")
url <- ("https://www.sofascore.com/south-africa-germany/REjsLgB")
connection <- ("stats_Southafrica_Germany.js")

writeLines(sprintf("var page = required('webpage).create();
page.open('%s', function(){
console.log(page.content); //page source
phantom.exit();
})", url), con=connection)

system_input <- "/Users/------/Documents/phantomjs/bin stats_Southafrica_Germany.js> stats_Southafrica_Germany.html"
system(system_input)
html <- "stats_Southafrica_Germany.html"
pg <- read_html(html)
pg %>%html_nodes(xpath="//*[contains(concat( " ", @class, " " ), concat( " ", "stat-home", " " ))]")%>% html_text()

I think i am not doing the node path thingey (html_nodes(xpath=...) incorrectly, but unsure tbh. to get the xpath= i got a chrome addon which showed xpaths when you selected items on the page, is this were i went wrong? The error is this:

Error: unexpected string constant in "pg %>%html_nodes(xpath="//*[contains(concat( " ", @class, ""

For the final line of code I copied. Can anyone help change my code and help me understand what I did wrong in a clear way?

Thanks

I haven't use xpath in forever.
That said, i think you just have some syntax errors with " inside the actual string.
Try changing all quotes to single quotes ' except the outside ones.

I know this is Rstudio and R, but I never found web scraping in R to be great.
The packages in Python and JS are much easier to use and seemingly more supported.

Thanks for the reply, i think you may have got it right with the " causing issues, still isnt working (a nice new error) but gonna keep tweaking until hopefully something happens. If not I will try Python. Cheers

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.