I need to automate the downloading of an Excel report from a company reporting platform and I plan to schedule my R script to run everyday (taskscheduleR) to pull down the report and then write it to a database table where I can query it using Tableau.
I have inspected the GET request that kicks off when I download the file (right click in chrome -> inspect -> network -> copy all as cURL cmd).
I have copied all the headers in the browser GET request into my GET request in R including the cookie details.
When I paste all this info into my R GET request (in the correct format) I can download the file of interest however the next day the working query doesn't work.
Is there some way I can stop having to paste in a new cookie or stop it from expiring?
This is the code I am using to download the file:
pacman::p_load(httr, tidyverse, readxl, keyring, curl, RCurl) address <- paste0("https://someprefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_",gsub("-", "",Sys.Date()-2),".xlsx") request <- GET( address, add_headers( `Connection` = 'keep-alive', `Upgrade-Insecure-Requests` = '1', `DNT` = '1', `User-Agent` = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36', `Accept` = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', `Sec-Fetch-Site` = 'same-origin', `Sec-Fetch-Mode` = 'navigate', `Sec-Fetch-User` = '?1', `Sec-Fetch-Dest` = 'document', `Referer` = 'https://prefix.myCompany.com/app/oneportalMaster.html', `Accept-Language` = 'en-US,en;q=0.9', `Accept-Encoding` = 'gzip, deflate, br', `Host` = 'prefix.myCompany.com', `Cookie` = 'LotsOfCookieInfoHere'), authenticate(user = "myName", password = keyring::key_get("MSID")), write_disk(tf <- tempfile(fileext = ".xlsx"))) df <- readxl::read_xlsx(tf)
When I re-run the code above the next day and I inspect the status of the GET request I have the following 401 error
request Response [https://prefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_thru_20200721.xlsx] Date: 2020-07-23 08:12 Status: 401 Content-Type: <unknown> <EMPTY BODY>