I am calling some awk code in a R script using system(). The awk works wonderfully and very quickly and grabs some text from a text file on a disk, and creates two new files on the disk and redirects some of the text into them (I call awk as it is super fast for wrangling 20+ Gb size files). The two files are meant to exist only ephemerally until their contents have been read into R variable using vroom() in the same R code. However, these two newly created files cannot deleted either by using system(paste("rm", shQuote(file))) in the R code, [close(file) within the awk code doesn't help], or by selecting and hitting delete in a Windows File explorer. In the latter case, the title error appears in a Windows dialogue window. The two files can be opened eg by Wordpad, but not edited, nor they can be renamed. The only way I have found to help is to restart Rstudio and the files can be then deleted manually. Also, if the files are still present on the disk (ie, I haven't restarted Rstudio and deleted them), the R script halts at awk call because awk can't create them either as they already exist. So, how to either avoid this situation in the first place. I have used awk in R/Rstudio before in similar scenarios without problems, and this is a new kind of problem for me...
How exactly are you calling vroom::vroom(), what are the column types and how are you handling returned object?
By default vroom is lazy and can use Alterp to read data only when it needs to be materialized. This also means that if there are column types for which Altrep is enabled (defaults to character), input file remains open until returned tibble is fully materialized. Or until that tibble is not referenced anymore and garbage collection has taken place.
To illustrate:
library(vroom)
# for which column types are Altrep is used?
vroom_altrep()
#> Using Altrep representations for:
#> * chr
# tmp copy of an example csv file
( tmp_csv <- tempfile(fileext = ".csv") )
#> [1] "D:/rtmp\\RtmpwNRoVT\\file5d94a255e82.csv"
file.copy(vroom_example("mtcars.csv"), tmp_csv)
#> [1] TRUE
df1 <- vroom(tmp_csv)
# list open csv files of current R process
subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 1 × 2
#> fd path
#> * <int> <chr>
#> 1 NA "D:\\rtmp\\RtmpwNRoVT\\file5d94a255e82.csv"
# attempt to remove
file.remove(tmp_csv)
#> Warning in file.remove(tmp_csv): cannot remove file
#> 'D:/rtmp\RtmpwNRoVT\file5d94a255e82.csv', reason 'Permission denied'
#> [1] FALSE
If you remove the object (or let it fall out of scope), trigger garbage collection and retry:
rm(df1)
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 827214 44.2 1440534 77 1440534 77.0
#> Vcells 1509664 11.6 8388608 64 2285882 17.5
subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 0 × 2
#> # ℹ 2 variables: fd <int>, path <chr>
file.remove(tmp_csv)
#> [1] TRUE
You can also just disable Altrep in exchange of likely longer initial read times:
file.copy(vroom_example("mtcars.csv"), tmp_csv)
#> [1] TRUE
df2 <- vroom(tmp_csv, altrep = FALSE)
subset(ps::ps_open_files(), grepl("\\.csv$", path))
#> # A data frame: 0 × 2
#> # ℹ 2 variables: fd <int>, path <chr>
file.remove(tmp_csv)
#> [1] TRUE