On Linux running R studio server 2022.02.3 Build 492 and R 4.2.0, RMarkdown 2.14 and Knitr 1.39, knitting repeatedly produces a different random value each time.
On a Mac running R Studio desktop 2022.07.0 Build 548 and R 4.2.1, RMarkdown 2.14 and Knitr 1.39, knitting repeatedly produces the same random value, taken from the cache...
Can you help me find what is wrong with my rstudo setup on Linux ?
Yes, a cache directory is created. It changes each time I press the knit button. Interestingly, I just realised that if I run rmarkdown::render() on the file, it ignores (overwrites) the cache files produced when I press the knit button, and then if I run the command again it uses the cache as expected. Pressing the knit button after running rmarkdown::render() results again in the cache being ignored and overwritten.
Regarding reproducibility, the same problem also happens on a colleague's computer (Ubuntu), and I could reproduce it again on my desktop computer after upgrading to the latest Rstudio package and R 4.2.1. This said I tried today to trigger it on my Linux laptop, but there the knitr cache worked properly...
What I see as being common between the two computers where the cache is ignored, is that R runs in a chroot environment. Do you have a suggestion of possible missing dependencies that I could check ?
I don't really have a clue on this issue. Never encounter it. I don't know if chroot could cause issue.
@yihui do you have any experience about cache invalidation in specific environment ?
My understanding is that only chunk options and code chunk content should be taken into account for the cache keys.
I have never seen such a situation before, and have no idea how this could possibly happen. To debug this problem, you would need to inspect the content object here: knitr/block.R at 0f0c9c26ef126db01d99ebc6008154fe48476cb3 · yihui/knitr · GitHub See what has changed in this object between two runs, which may give us a clue why the cache doesn't persist.
I did not understand what you suggest me to do with the GitHub link (sourcing it did not produce output, and running line 94 requires to paste more code from your knitr package), but I have compared the binary contents of the RData files of the cache.
First, because they differ at each run by the random seed, I added set.seed(1664) to the R Markdown document to stablise the output. Then, the difference between two cache files generated by pressing the knit button twice in a row is:
diff -u <(zcat test_Knitbutton1.RData | hexdump -c) <(zcat test_Knitbutton2.RData| hexdump -c)
--- /dev/fd/63 2022-07-13 06:39:26.331681730 +0900
+++ /dev/fd/62 2022-07-13 06:39:26.331681730 +0900
@@ -161,9 +161,9 @@
0000a00 # 347 367 220 254 ~ ` 334 \0 \0 004 002 \0 \0 \0 001
0000a10 \0 004 \0 \t \0 \0 \0 < . b l a h _ c a
0000a20 c h e / h t m l / t e s t _ c a
-0000a30 c h e _ f 2 4 f b 4 5 5 d 7 7 b
-0000a40 2 9 5 7 e 7 0 d 9 3 2 0 9 5 5 0
-0000a50 0 2 3 6 \0 \0 \0 020 \0 \0 \0 001 \0 004 \0 \t
+0000a30 c h e _ b 2 2 9 1 4 8 0 4 7 c e
+0000a40 1 d f d c 0 3 d 5 7 3 0 f c d 3
+0000a50 8 0 5 8 \0 \0 \0 020 \0 \0 \0 001 \0 004 \0 \t
0000a60 \0 \0 \0 ; \n ` ` ` r \n s e t . s e
0000a70 e d ( 1 6 6 4 ) \n r n o r m ( 1
0000a80 ) \n ` ` ` \n \n ` ` ` \n # # [ 1
The rdb file is always empty and the rdx file never changes.
Running RMarkdown by hand always produces the same cache files and the RData in this case is not gzip-compressed.
Now you can install this debugging version via remotes::install_github('yihui/knitr@inspect-cache'). After restarting R, knit the document twice, and you should get two files with names of the form cache_2022-07-12 19:24:28.rds in the same directory of the Rmd file. You can read them via readRDS() and check the differences. If these files do not contain sensitive information, you may also upload them here and we can take a look.
Okay, that explains why the cache is invalidated every time: the chunk option fig.path is set to a temp dir every time, and this temp dir is unique to each R session. If the value of a chunk option changes, the cache will be invalidated.
I don't know why this happens (i.e., why fig.path is set to a temp dir). I guess it has something to do with chroot, which might have made the directory of the Rmd file not writable, so RStudio decides to output the temp dir. I'm not familiar with chroot, so I'm not totally sure.
If the directory of the Rmd file is writable, it should have the *_files/ and *_cache/ subdirectories, and the cache should work properly.
I still have not pinpointed the cause of the problem but here is what I found today:
I could reproduce the problem outside the chroot on the same computer.
the directory of the vignette is always writeable regardless on whether I am in or out the chroot.
The problem happens in a specific RStudio project. If I exit the project or create a fresh project with the same vignette, the cache works properly. (which gives me a nice escape route from the problem).
In the project where fig.path is set to a temporary directory,I moved the .Rproj.user directory so that a fresh one is generated, but this did not solve the problem.
Using an identical .Rproj configuration file in a new project is not enough to trigger the problem.
Are there other project-specific files that I could inspect to find the cause of the problem ?
I finally isolated the problem! It was a long battle and I thank you for your support. A reproducible example on Linux and Mac is available at the GitHub repository charles-plessy/KnitrCacheInvalidationProblem (URL below).
In brief, if a .Rproj file specifies BuildType: Package, then the R Markdown cache will be invalidated for the Rmd files in the vignettes sub-directory, and only for them, because the computations are made in a temporary folder instead of the vignette's directory. Is that expected?
Just for the record, if I remove the BuildType: Package line in the .Rproj file of the project where I experienced the issue first (not the minimal reproducible example), then R Studio adds it back silently upon loading. (The project directory is indeed structured as a package, in which I use pkgdown to build vignettes that are computationally heavy). So at the moment I can not escape the cache problem unless I would reorganise the project structure extensively. If there is a simple workaround on the R Studio side please let me know.
I think this is something that would be worth reporting on the RStudio IDE side.
RStudio IDE seems to detect the package type, and then do specific thing including for vignette where it is rendered in a temp directory.
@yihui I wonder if there is a configuration we could change in that case. Is there a way to set the cache folder elsewhere ? It seems tricky as RStudio IDE will do everything in temp folder and this would require absolute path to package dir or elsewhere.
Using cache for vignettes seems specific though. I am not really sure how the vignette will be built on CRAN but the cache won't be available either. But maybe you are not planning to release the package and just using a package structure ?
I don't think there is a configuration to prevent RStudio from rendering the vignette in a temp dir. I don't remember the reason why RStudio does this now (probably because the vignette dir needs to keep clean).
However, since we do have access to the file path of the vignette Rmd file via knitr::current_input(dir = TRUE), it's possible to set the cache path using this path, e.g.,
I do not see a cache/ directory but a KnitrCacheTest_cache directory instead. If I change the R command accordingly (see below), the cache is still ignored.