download.file() issue, corrupted file

I am trying to create a package to download, import and clean data from the Dominican Republic Central Bank web page. I have done all the coding in Rstudio.cloud and everything works just fine, but when I try the functions in my local machine they do not work.

After digging a bit on each function, I realized that the problem was the downloaded file, it is corrupt.

I am including the first steps of a function just to illustrate my issue.

file url

# Packages
library(readxl)

# file url. 
url <- paste0("https://cdn.bancentral.gov.do/documents/",
              "estadisticas/precios/documents/",
              "ipc_base_2010.xls?v=1570116997757")

# termporary path
file_path <- tempfile(pattern = "", fileext = ".xls")

# downloading 
download.file(url, file_path, quiet = TRUE)

# reading the file
ipc_general <- readxl::read_excel(
            file_path,
            sheet = 1,
            col_names = FALSE,
            skip = 7
        )

Error: 
  filepath: C:\Users\Johan Rosa\AppData\Local\Temp\RtmpQ1rOT3\2a74778a1a64.xls
  libxls error: Unable to open file

I am using temporary but that is not the problem, you can try to download the file in your working directory and the problem persist.

I want to konw:

  1. Why this code works in rstudio.clowd and not local?
  2. What can I do to get the job done? (alternative approach, packages, functions)

RStudio Cloud runs on Linux servers, but for Windows, you need to make some adjustments to the download.file() command.

download.file(url, file_path, quiet = TRUE, mode = "wb")
1 Like

Excellent, it works. Thanks!!

Now I have to think a way to detect if the function is running on Linux or Windows, to set that argument accordingly.

I can write a new download file function using if else calls on .Platform$OS.type result.

Or, can I set mode = "wb" for all download.file() calls?

Do you have any recommendations?

I don't know if this is the best way to handle this multi-OS situation but I use a very similar approach, I use this function to get the OS and perform some tasks accordingly, obviously, this is not a general solution since it only works for the operating systems that I use, but it can give you an idea.

get_os <- function(){
    sysinf <- Sys.info()
    os <- sysinf['sysname']
    if (os == 'Darwin'){
        os <- "osx"
    } else {
        os <- .Platform$OS.type
        if (grepl("^darwin", R.version$os))
            os <- "osx"
        if (grepl("linux-gnu", R.version$os))
            os <- "linux"
        if (grepl("linux-gnueabihf", R.version$os))
            os <- "raspbian"
    }
    tolower(os)
}

This is supposed to be Windows-only but I have tested it on Ubuntu and it works normally so I would say, give it a try.

1 Like

When I was facing your situation (lots of foul language was involved) I ended up sidestepping the issue and downloading via curl::curl_download().

It has other benefits in addition to being platform neutral, and it is imported by httr which in turn is used by just about everyone, so I did not feel so bad about creating a package dependency.

1 Like

I will try that function. Thanks, I did not know about curl::curl_download().

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.