I am trying to update html session and move to the second page in website, but i get an error: Navigating to javascript:__doPostBack('ctl00$ContentPlaceHolder1$UCNested$grdRTAList','Page$2') Error in curl::curl_fetch_memory(url, handle = handle) : Port number ended with '_'
library(rvest)
url <- "http://rtais.wto.org/UI/PublicMaintainRTAHome.aspx"
pgsession <- html_session(url) %>% follow_link(css =
"#ContentPlaceHolder1_lnkRTAList") # List of all RTAs in force
pgsession <- pgsession %>%
follow_link(xpath = "//table[@id='ContentPlaceHolder1_UCNested_grdRTAList']//a[.='2']")
I think this is because the link inside a table is a reference to javascript call and not a page. So following_link that uses curl don't know what to do with that.
From the web page, this is a <a> you are selecting
<a id="ContentPlaceHolder1_UCNested_grdRTAList_RTAIDHyperLink_0" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$UCNested$grdRTAList$ctl02$RTAIDHyperLink','')">Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP)</a>
You see that href is not a url but a javascript call.
I think you need to use other solution that can deal with that.
Thank for replying back.
I used rvest and I managed to scrap the first page along with its rta links (such as: Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP) - http://rtais.wto.org/UI/PublicShowRTAIDCard.aspx?rtaid=640), but how to update session with second page contents was impossible to me.