Replication between different machines

I am working with LDA models in R (using both topicmodels::LDA and quanteda::textmodel_lda) and noticed that the results differ slightly across different machines, even when I use set.seed(1234) and the same dataset.

So, I have a few questions:
Is this expected due to BLAS/LAPACK or low-level random number generation differences?

Is there a recommended way to enforce bit-for-bit reproducibility of LDA results across machines in R?

Would you recommend always saving fitted models with saveRDS() to ensure reproducible outputs instead of re-fitting?

Hi @jeannemoreau ,

yes, it’s expected that LDA results differ slightly across machines even with set.seed(). Differences in BLAS/LAPACK implementations, CPU instructions, compilers, and low‑level RNG behavior mean that floating‑point operations won’t be bit‑identical across systems. Even tiny numeric differences early in the algorithm can lead to small variations in the final topic distributions.

There isn’t a reliable way to force perfect cross‑platform determinism in standard R. The only setups that "can" guarantee bit‑for‑bit identical results are fully frozen environments like Docker images with identical BLAS and hardware, or running R inside WebAssembly. The WebR project is a great example: because everything runs inside a WASM sandbox with a fixed math stack, results become identical across platforms - but it comes with the overhead of heaving to compile all necessary packages to WASM first. Looking at WebR package list both topicmodels and quanteda are already precompiled and available for webR.

1 Like