Hello, I'm planning a short workshop on web scraping and want the students to be able to use RStudio Cloud. When trying to connect to the site of interest to check the robots.txt file I repeatedly get this error:
Error in curl::curl_fetch_memory(url, handle = handle) : Failed to connect to www.fanfiction.net port 443: Connection timed out
The second line is the most important as it is common to a few errors I've received when trying different ways to connect.
This doesn't happen when using RStudio desktop. Is it a proxy thing? If it is - what does that mean and what would a solution look like?
Thanks for reading!
Code to reproduce my error:
#install.packages("robotstxt")
library(robotstxt)
rt = robotstxt("fanfiction.net")
I expect to get rt
, a list of 11 elements - this is what happens running on my computer.
Trying on my own site doesn't give me an issue in the cloud.
rt_works = robotstxt("blog.dataembassy.co.nz")