Windows 11: The action can't be completed because the file is open in Rstudio R Session.

I am calling some awk code in a R script using system(). The awk works wonderfully and very quickly and grabs some text from a text file on a disk, and creates two new files on the disk and redirects some of the text into them (I call awk as it is super fast for wrangling 20+ Gb size files). The two files are meant to exist only ephemerally until their contents have been read into R variable using vroom() in the same R code. However, these two newly created files cannot deleted either by using system(paste("rm", shQuote(file))) in the R code, [close(file) within the awk code doesn't help], or by selecting and hitting delete in a Windows File explorer. In the latter case, the title error appears in a Windows dialogue window. The two files can be opened eg by Wordpad, but not edited, nor they can be renamed. The only way I have found to help is to restart Rstudio and the files can be then deleted manually. Also, if the files are still present on the disk (ie, I haven't restarted Rstudio and deleted them), the R script halts at awk call because awk can't create them either as they already exist. So, how to either avoid this situation in the first place. I have used awk in R/Rstudio before in similar scenarios without problems, and this is a new kind of problem for me...

How exactly are you calling vroom::vroom(), what are the column types and how are you handling returned object?

By default vroom is lazy and can use Alterp to read data only when it needs to be materialized. This also means that if there are column types for which Altrep is enabled (defaults to character), input file remains open until returned tibble is fully materialized. Or until that tibble is not referenced anymore and garbage collection has taken place.

To illustrate:

library(vroom)

# for which column types are Altrep is used?
vroom_altrep()
#> Using Altrep representations for:
#>  * chr

# tmp copy of an example csv file
( tmp_csv <- tempfile(fileext = ".csv") )
#> [1] "D:/rtmp\\RtmpwNRoVT\\file5d94a255e82.csv"
file.copy(vroom_example("mtcars.csv"), tmp_csv)
#> [1] TRUE

df1 <- vroom(tmp_csv)

# list open csv files of current R process
subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 1 × 2
#>      fd path                                       
#> * <int> <chr>                                      
#> 1    NA "D:\\rtmp\\RtmpwNRoVT\\file5d94a255e82.csv"

# attempt to remove
file.remove(tmp_csv)
#> Warning in file.remove(tmp_csv): cannot remove file
#> 'D:/rtmp\RtmpwNRoVT\file5d94a255e82.csv', reason 'Permission denied'
#> [1] FALSE

If you remove the object (or let it fall out of scope), trigger garbage collection and retry:

rm(df1)
gc()
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  827214 44.2    1440534   77  1440534 77.0
#> Vcells 1509664 11.6    8388608   64  2285882 17.5

subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 0 × 2
#> # ℹ 2 variables: fd <int>, path <chr>
file.remove(tmp_csv)
#> [1] TRUE

You can also just disable Altrep in exchange of likely longer initial read times:


file.copy(vroom_example("mtcars.csv"), tmp_csv)
#> [1] TRUE
df2 <- vroom(tmp_csv, altrep = FALSE)
subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 0 × 2
#> # ℹ 2 variables: fd <int>, path <chr>
file.remove(tmp_csv)
#> [1] TRUE