Help with understanding GET requests and cookies

Hi,

I need to automate the downloading of an Excel report from a company reporting platform and I plan to schedule my R script to run everyday (taskscheduleR) to pull down the report and then write it to a database table where I can query it using Tableau.

I have inspected the GET request that kicks off when I download the file (right click in chrome -> inspect -> network -> copy all as cURL cmd).

I have copied all the headers in the browser GET request into my GET request in R including the cookie details.

When I paste all this info into my R GET request (in the correct format) I can download the file of interest however the next day the working query doesn't work.

Is there some way I can stop having to paste in a new cookie or stop it from expiring?

This is the code I am using to download the file:

pacman::p_load(httr, tidyverse, readxl, keyring, curl, RCurl)

address <- paste0("https://someprefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_",gsub("-", "",Sys.Date()-2),".xlsx")

request <- GET(
  address,
  add_headers(
    `Connection` = 'keep-alive',
    `Upgrade-Insecure-Requests` = '1',
    `DNT` = '1',
    `User-Agent` = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
    `Accept` = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    `Sec-Fetch-Site` = 'same-origin',
    `Sec-Fetch-Mode` = 'navigate',
    `Sec-Fetch-User` = '?1',
    `Sec-Fetch-Dest` = 'document',
    `Referer` = 'https://prefix.myCompany.com/app/oneportalMaster.html',
    `Accept-Language` = 'en-US,en;q=0.9',
    `Accept-Encoding` = 'gzip, deflate, br',
    `Host` = 'prefix.myCompany.com',
    `Cookie` = 'LotsOfCookieInfoHere'),
  authenticate(user = "myName", 
               password = keyring::key_get("MSID")),
  write_disk(tf <- tempfile(fileext = ".xlsx")))

df <- readxl::read_xlsx(tf)

When I re-run the code above the next day and I inspect the status of the GET request I have the following 401 error

request
Response [https://prefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_thru_20200721.xlsx]
  Date: 2020-07-23 08:12
  Status: 401
  Content-Type: <unknown>
<EMPTY BODY>

perhaps you would find it more convenient to use something like RSelenium to automate the right click action you perform in browser

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.