I need to automate the downloading of an Excel report from a company reporting platform and I plan to schedule my R script to run everyday (taskscheduleR) to pull down the report and then write it to a database table where I can query it using Tableau.
I have inspected the GET request that kicks off when I download the file (right click in chrome -> inspect -> network -> copy all as cURL cmd).
I have copied all the headers in the browser GET request into my GET request in R including the cookie details.
When I paste all this info into my R GET request (in the correct format) I can download the file of interest however the next day the working query doesn't work.
Is there some way I can stop having to paste in a new cookie or stop it from expiring?
This is the code I am using to download the file:
pacman::p_load(httr, tidyverse, readxl, keyring, curl, RCurl)
address <- paste0("https://someprefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_",gsub("-", "",Sys.Date()-2),".xlsx")
request <- GET(
address,
add_headers(
Connection = 'keep-alive',
Upgrade-Insecure-Requests = '1',
DNT = '1',
User-Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
Accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9',
Sec-Fetch-Site = 'same-origin',
Sec-Fetch-Mode = 'navigate',
Sec-Fetch-User = '?1',
Sec-Fetch-Dest = 'document',
Referer = 'https://prefix.myCompany.com/app/oneportalMaster.html',
Accept-Language = 'en-US,en;q=0.9',
Accept-Encoding = 'gzip, deflate, br',
Host = 'prefix.myCompany.com',
Cookie = 'LotsOfCookieInfoHere'),
authenticate(user = "myName",
password = keyring::key_get("MSID")),
write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- readxl::read_xlsx(tf)
When I run the same code the following day I get the status below:
> request
Response [https://prefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_thru_20200721.xlsx]
Date: 2020-07-23 08:12
Status: 401
Content-Type: <unknown>
<EMPTY BODY>