TLDR: I have a data-set of links that give me 404 errors, but there's a useful URL in the address bar that comes when I get the 404 error. Can I access that "useful URL" in R?
I'm trying to scrape data from a webpage, but I'm (understandably) getting a 404 error for the URLs below. However, there's data from the 404 link that I'm trying to get from within the browser. Here's the example:
library(tidyverse)
library(rvest)
url <- "http://www.uscho.com/scoreboard/division-i-men/20172018/composite-schedule/"
link_list <- url %>%
read_html() %>%
html_nodes("td:nth-child(13) a") %>%
html_attr("href") %>%
{paste0("http://www.uscho.com", .)}
Now, for example, search the 200th link here (http://www.uscho.com/recaplink.php?gid=1_970_20172018) in your web browser. You'll get this:
I don't actually want to get a 404 Error, but in the address bar, there's a URL that -- after some manipulation -- I can use to get the actual webpage that I want ("https://www.uscho.com/recaps/?p=171810970")
This URL, however, doesn't show up in R anywhere from what I can tell. Running read_html(link_list[200]), I only get a 404 error.
Any idea how I can get the URL from the browser within R?
FYI I asked this question on stack exchange earlier, but chances are it won't get answered there, and I thought this may be a better place to ask.