Before we start, from an ethical standpoint, when you're scraping a site, unless it's a site from a big player (Google, Microsoft, Yahoo, etc) be polite and don't hammer them with requests. It could be someone's home server you end up crashing or someone's hobby site you end up costing hundreds of dollars in billing from their provider.
That said, there are some things you might try.
First, check the site to see if they offer am API for the data you want. They probably won't, but a surprising number of sites do. If they do, learn the API, it's better for you and them.
Second, if there's no API, check to see if there's an option on the page you want to scrape for the number of records or items to show on a page. If there is, you should select the largest value you can. Look to see if it changes anything in the URL, someone's you can edit this manually to view even more records at once. Or, if you can choose how many items to view but nothing changes in the URL, you might consider switching from
Next, you might consider doing the work in batches. Do multiple (possibly all) of your
read_html() calls and save a list of page objects, then parse all of the pages later.
Aside from that, R is a primarily single-threaded process, I've personally not found a great way to speed up the web scraping. But, if I was going to scrape something which would take multiple days to complete, I would likely split the work and run the code on multiple Amazon EC2 instances. But, I'm not a parallelization expert so their might be a better or easier way to do this.