Automatic Scheduled Data Refresh




I am currently using Python to scrape html data every night at midnight. After the data is scraped it is stored in an excel file on my computer. The file gets refreshed and updated with the new data every night. I would like to create R code that would make a connection to the excel file at 3am and automatically run some data manipulation code I will write. I was wondering what code or package could be used to accomplish this?

Thank you!


I'm assuming you are using some sort of cron-job to do Python scraping. You can use the same strategy here: write R script and set it up to run daily at 3 am via cron-job. R already comes with Rscript utility that will run your script. There is also package called littler (you can find more about it here) that does similar stuff.

As for Excel, there is readxl, writexl and openxlsx.

However, I would suggest using some sort of RDBMS (SQLite, for example) to store your data if you don't need Excel files explicitly. If Python saves them and then R reads them, then Excel is not the best data storage solution.