update html session R

I am trying to update html session and move to the second page in website, but i get an error: Navigating to javascript:__doPostBack('ctl00$ContentPlaceHolder1$UCNested$grdRTAList','Page$2') Error in curl::curl_fetch_memory(url, handle = handle) : Port number ended with '_'

library(rvest)
url <- "http://rtais.wto.org/UI/PublicMaintainRTAHome.aspx"
pgsession <- html_session(url) %>% follow_link(css = 
"#ContentPlaceHolder1_lnkRTAList") # List of all RTAs in force
pgsession <- pgsession %>% 
follow_link(xpath = "//table[@id='ContentPlaceHolder1_UCNested_grdRTAList']//a[.='2']")

I think this is because the link inside a table is a reference to javascript call and not a page. So following_link that uses curl don't know what to do with that.

From the web page, this is a <a> you are selecting

<a id="ContentPlaceHolder1_UCNested_grdRTAList_RTAIDHyperLink_0" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$UCNested$grdRTAList$ctl02$RTAIDHyperLink','')">Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP)</a>

You see that href is not a url but a javascript call.

I think you need to use other solution that can deal with that.

rvest is not enough here I think

Thank for replying back.
I used rvest and I managed to scrap the first page along with its rta links (such as: Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP) - http://rtais.wto.org/UI/PublicShowRTAIDCard.aspx?rtaid=640), but how to update session with second page contents was impossible to me.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.