When doing operations on a large amount of vectors (e.g., as part of the creation of a null distribution in a permutation test), I either get the error "vector memory exhausted (limit reached?)" or my RStudio session crashes. These problems occur with the code below.
I want to carry out operations on vectors including subtracting one vector from all other vectors as well as computing the dot product between one vector and all other vectors. On my computer (R version 3.6.2, 64-bit on MAC OS Catalina) I can do this with 1000 0000 vectors; so to get more I tried to split the process up by creating an outer for loop; so to get 2 millions I basically do it in a loop twice and then merge the final results in one column. However, this strategy didn't work.
How can one manage the memory resources better in this examlpe.
Any guidance is much appreciated. (I have tried to remove objects when not needed using rm() )
library(tibble)
set.seed(1)
#Example data
subtract <- runif(1000)
multiply <- runif(1000)
df_row <- runif(1000)
df <- as_tibble(matrix(sample(df_row), nrow=1000000, ncol = 1000))
# Time keeping
t1 <- Sys.time()
# list to store final results from for loop
outer_list <- list()
# for loop (here only looping twice; but could be increased to get larger distribution)
for(i_outer in 1:2){
# List to store results from inner for loop where data is further split up in smaller more manageable chunks.
random_split <- list(df, df)
# Various operations on the lists
inner_list <- list() #
for(i in 1:length(random_split)) {
dot_products_null <- random_split[i][1] %>%
# Subtracting vector on all rows
map(~ map2_df(.x, subtract, `-`)) %>%
# Dot product for all rows
map(~as.matrix(.x) %*% multiply) %>%
unlist() %>%
as_tibble()
inner_list[i] <- dot_products_null
rm(dot_products_null)
inner_list
}
inner_list <- as_tibble(unlist(inner_list))
outer_list[i_outer] <- inner_list
outer_list
}
outer_list
t2 <- Sys.time()
t2-t1