Hi,
I need to automate the downloading of an Excel report from a company reporting platform and I plan to schedule my R script to run everyday (taskscheduleR) to pull down the report and then write it to a database table where I can query it using Tableau.
I have inspected the GET request that kicks off when I download the file (right click in chrome -> inspect -> network -> copy all as cURL cmd).
I have copied all the headers in the browser GET request into my GET request in R including the cookie details.
When I paste all this info into my R GET request (in the correct format) I can download the file of interest however the next day the working query doesn't work.
Is there some way I can stop having to paste in a new cookie or stop it from expiring?
This is the code I am using to download the file:
pacman::p_load(httr, tidyverse, readxl, keyring, curl, RCurl)
address <- paste0("https://someprefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_",gsub("-", "",Sys.Date()-2),".xlsx")
request <- GET(
address,
add_headers(
`Connection` = 'keep-alive',
`Upgrade-Insecure-Requests` = '1',
`DNT` = '1',
`User-Agent` = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
`Accept` = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
`Sec-Fetch-Site` = 'same-origin',
`Sec-Fetch-Mode` = 'navigate',
`Sec-Fetch-User` = '?1',
`Sec-Fetch-Dest` = 'document',
`Referer` = 'https://prefix.myCompany.com/app/oneportalMaster.html',
`Accept-Language` = 'en-US,en;q=0.9',
`Accept-Encoding` = 'gzip, deflate, br',
`Host` = 'prefix.myCompany.com',
`Cookie` = 'LotsOfCookieInfoHere'),
authenticate(user = "myName",
password = keyring::key_get("MSID")),
write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- readxl::read_xlsx(tf)
When I re-run the code above the next day and I inspect the status of the GET request I have the following 401 error
request
Response [https://prefix.myCompany.com/servlets/payer-reports/Blah%20COVID%20Reports/Report_Name_thru_20200721.xlsx]
Date: 2020-07-23 08:12
Status: 401
Content-Type: <unknown>
<EMPTY BODY>