Unable to access data from a file passed as a parameter in RMarkdown when file size is large

Hello,

I am creating an RMarkdown that allows users to (1) upload a file , and (2) have R Markdown to process the uploaded file to generate a report. I set

options(shiny.maxRequestSize = 9999*1024^2)

in .Rprofile, so that file size is not an issue when uploading to shiny user interface, as mentioned in here and here.

Somehow, when the file size is large (e.g. > 200 MB), knitting the R Markdown spits error message (shown below), but no such issue when file is small (e.g., 1% random sample of the original file). Please note, there is no problem uploading the large-size file to the GUI, but the problem seems to occur when R Markdown trying to access the data (temp.csv, set in the YAML header) after the file (original_data.csv) is uploaded successfully.

Could anyone figure out the reason why? R version is 4.1.0, R Studio version is 1.4.1717, and reproducible codes are pasted below. Thanks so much.

Error message when using data.table::fread()

Error in fread(params$datainput) : File 'temp.csv' does not exist or is non-readable. getwd() == 'C:/Users/abc/MyFolder' Calls: <Anonymous" ... withCallingHandlers -> withVisible -> eval -> eval -> fread

Error message when using read.csv()

Error in file(file, "rt") : cannot open the connection Calls: ... withVisible -> eval -> eval -> read.csv -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'temp.csv': No such file or directory

Codes:

---
title: "TBD"
author: "TBD"
date: "TBD"
output:
  bookdown::html_document2:
    df_print: paged
params:
#=======================#
# Render Function:		#
#=======================#
  datainput:
    input: file
    label: 'Upload file:'
    value: temp.csv
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE,
                      fig.width = 12, fig.height = 8)

## Load packages
library(data.table)
library(tidyverse)
library(shiny)

R Markdown

This is an R Markdown document.


temp <- fread(params$datainput)
# Same error occurs if using read.csv()
#temp <- read.csv(params$datainput)

dim(temp)

That is where the files are being sought. Are they actually there?

That is the project directory, and the data I intend to upload is there (data_original.csv). However, the error from RMarkdown shows that R is looking for temp.csv (something I named arbitrarily in the YAML header), instead of data_original.csv.

Error in fread(params$datainput) : File 'temp.csv' does not exist or is non-readable. getwd() == 'C:/Users/abc/MyFolder' Calls: <Anonymous" ... withCallingHandlers -> withVisible -> eval -> eval -> fread

params:
#=======================#
# Render Function:		#
#=======================#
  datainput:
    input: file
    label: 'Upload file:'
    value: temp.csv

---

A bit more information. The first post said I had problem accessing the data passed from the GUI when knitting RMarkdown with parameters, if the file size is large (e.g., > 200MB). To see whether I would encounter the same problem when using a pure shiny interface, I developed a simple Shiny app shown below. Turns out, there is no problem for Shiny to read in and process data of similar or much larger size from my local machine:

# Purpose: An app to study fileInput(), and how large shiny can upload
# Reference: https://mastering-shiny.org/action-transfer.html?q=datapath#uploading-data

library(shiny)
ui <- fluidPage(
  fileInput("upload", NULL, buttonLabel = "Upload...", multiple = TRUE),
  tableOutput("files"),
  verbatimTextOutput("vtext"),
  verbatimTextOutput("vdim")
)
server <- function(input, output, session) {
  output$files <- renderTable(input$upload)
  output$vtext <- renderText(str(input$upload))
  
  infile <- reactive({
    req(input$upload)
    temp <- readr::read_csv(input$upload$datapath)
  })
  output$vdim <- renderText(dim(infile()))
}

shinyApp(ui = ui, server = server)

If it’s looking for temp.csv as instructed in yaml that’s not surprising. Maybe get running with a tiny csv file first to isolate that?

Thanks---however, what confuses me is that it's looking for temp.csv only when the file size exceeds certain level, but not when the file size is small...

Would you care to explain a bit more about what 'isolate' means?

Isolate, as in does it happen the same way with small files as with large (in which case it has to do with the yaml heading, I think) or only with large (in which case confirm that the file temp.csv exists and is large).

This is even more strange---In one occasion, when running the same R Markdown that had always failed before, it worked---it happened just when I showed the example to one of my colleagues.

But a few minutes later, when I repeated the procedure with the same R Markdown and same file, it failed again. The fact that sometimes it worked but most of the time (9 out of 10 I would say) suggests there is some instability, but I have no idea where the source is.

I got a segfault when using this on a 200MB file, but it worked fine with readr::read_csv.

Thanks--I tried readr::read_csv() (and also read.csv(), in addition to fread() ) before posting this issue; however the same issue still there.

So, to summarize:

  1. fread works on small temp.csv
  2. fread fails on large—both our systems, intermittently for you, but crashes for me
  3. fread_csv works on shiny for you, knit for me on large files
  4. fread_csv doesn't knit for you on large files

Here my sessioninfo:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 21.04

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.4.0 clipr_0.7.1  

loaded via a namespace (and not attached):
 [1] bookdown_0.24    digest_0.6.29    magrittr_2.0.1   evaluate_0.14    ncdf4_1.18      
 [6] rlang_0.4.12     stringi_1.7.6    rstudioapi_0.13  rmarkdown_2.11   tools_4.1.0     
[11] xfun_0.28        yaml_2.2.1       rsconnect_0.8.25 fastmap_1.1.0    compiler_4.1.0  
[16] htmltools_0.5.2  knitr_1.36

Thanks so much. Here is my summary (same for both AWS or Desktop RStudio):
(1) After hitting the "knitting with parameters" button, fread, readr::read_csv, and read.csv() work on small csv file
(2) After hitting the "knitting with parameters" button, fread, readr::read_csv, and read.csv() fail on large file in general, spitting errors as show in the first post
(2-1) occasionally fread works on large file, but only intermittently--- I would say less than 10% of the time
(2-2) I haven't extensively tested readr::read_csv or read.csv(); perhaps they would work intermittently as fread, but perhaps they won't work at all
(3) No problem with Shiny--large or small files, using fread, readr::read_csv, or read.csv.

Session information of Desktop RStudio:

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shiny_1.7.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        jquerylib_0.1.4   bslib_0.3.1       later_1.3.0       pillar_1.6.4     
 [6] compiler_4.1.0    tools_4.1.0       digest_0.6.28     jsonlite_1.7.2    evaluate_0.14    
[11] lifecycle_1.0.1   tibble_3.1.4      gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.11     
[16] DBI_1.1.1         yaml_2.2.1        xfun_0.28         fastmap_1.1.0     dplyr_1.0.7      
[21] knitr_1.36        sass_0.4.0        generics_0.1.1    htmlwidgets_1.5.4 vctrs_0.3.8      
[26] hms_1.1.1         grid_4.1.0        DT_0.20           cowplot_1.1.1     tidyselect_1.1.1 
[31] glue_1.4.2        R6_2.5.1          fansi_0.5.0       bookdown_0.24     rmarkdown_2.11   
[36] ggplot2_3.3.5     purrr_0.3.4       readr_2.1.0       tzdb_0.1.2        magrittr_2.0.1   
[41] promises_1.2.0.1  scales_1.1.1      ellipsis_0.3.2    htmltools_0.5.2   rsconnect_0.8.25 
[46] assertthat_0.2.1  xtable_1.8-4      mime_0.12         colorspace_2.0-2  httpuv_1.6.3     
[51] utf8_1.2.2        munsell_0.5.0     cachem_1.0.6      crayon_1.4.2

And session information of AWS RStudio:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shiny_1.6.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        jquerylib_0.1.4   bslib_0.3.0       later_1.3.0       pillar_1.6.2      compiler_4.0.3   
 [7] tools_4.0.3       digest_0.6.27     jsonlite_1.7.2    evaluate_0.14     lifecycle_1.0.0   tibble_3.1.4     
[13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.12      DBI_1.1.1         yaml_2.2.1        xfun_0.26        
[19] fastmap_1.1.0     dplyr_1.0.7       knitr_1.34        sass_0.4.0        generics_0.1.0    htmlwidgets_1.5.4
[25] vctrs_0.3.8       hms_1.1.0         grid_4.0.3        DT_0.20           cowplot_1.1.1     tidyselect_1.1.1 
[31] glue_1.4.2        R6_2.5.1          fansi_0.5.0       bookdown_0.24     rmarkdown_2.11    ggplot2_3.3.5    
[37] purrr_0.3.4       readr_2.0.2       tzdb_0.1.2        magrittr_2.0.1    promises_1.2.0.1  scales_1.1.1     
[43] ellipsis_0.3.2    htmltools_0.5.2   assertthat_0.2.1  xtable_1.8-4      mime_0.11         colorspace_2.0-2 
[49] httpuv_1.6.3      utf8_1.2.2        munsell_0.5.0     cachem_1.0.6      crayon_1.4.2

We seem to have a mix of OS. slight difference in R versions. Try full path name?

Thanks but still the same issue is there, after I put the full path for the file name set in YAML. Also the inability to access data after file is being read is not limited to fread(); same thing happens if read.csv or readr::read_csv is used.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.