I need to automate the downloading of an Excel report from a company reporting platform and I plan to schedule my R script to run everyday (taskscheduleR) to pull down the report and then write it to a database table where I can query it using Tableau.
I have inspected the GET request that kicks off when I download the file (right click in chrome -> inspect -> network -> copy all as cURL cmd).
I have copied all the headers in the browser GET request into my GET request in R including the cookie details.
When I paste all this info into my R GET request (in the correct format) I can download the file of interest however the next day the working query doesn't work.
Is there some way I can stop having to paste in a new cookie or stop it from expiring?
This is the code I am using to download the file:
pacman::p_load(httr, tidyverse, readxl, keyring, curl, RCurl) address <- paste0("https://someprefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_",gsub("-", "",Sys.Date()-2),".xlsx") request <- GET( address, add_headers( Connection = 'keep-alive', Upgrade-Insecure-Requests = '1', DNT = '1', User-Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36', Accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9', Sec-Fetch-Site = 'same-origin', Sec-Fetch-Mode = 'navigate', Sec-Fetch-User = '?1', Sec-Fetch-Dest = 'document', Referer = 'https://prefix.myCompany.com/app/oneportalMaster.html', Accept-Language = 'en-US,en;q=0.9', Accept-Encoding = 'gzip, deflate, br', Host = 'prefix.myCompany.com', Cookie = 'LotsOfCookieInfoHere'), authenticate(user = "myName", password = keyring::key_get("MSID")), write_disk(tf <- tempfile(fileext = ".xlsx"))) df <- readxl::read_xlsx(tf)
When I run the same code the following day I get the status below:
> request Response [https://prefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_thru_20200721.xlsx] Date: 2020-07-23 08:12 Status: 401 Content-Type: <unknown> <EMPTY BODY>