Sometime I write for loop, and user c() inside for loop.
x <- NULL
for(i in seq_along(1:100)){
x <- c(x, i)
}
When I write this code, people will say that it's not efficient.
I heard that it is recommended to assign to a vector with zeros instead of null vector, and I would like to know why.
And Advanced-r: if you’re generating data, make sure to preallocate the output container. Otherwise the loop will be very slow;5 Control flow | Advanced R (hadley.nz)
because memory allocation is a costly operation, if you extend x via c() then you memory allocate many more times than if you allocate once at the start
I believe this is the explanation that you are looking for:
21.3.3 Unknown output length
Sometimes you might not know how long the output will be. For example, imagine you want to simulate some random vectors of random lengths. You might be tempted to solve this problem by progressively growing the vector:
means <- c(0, 1, 2)
output <- double()
for (i in seq_along(means)) {
n <- sample(100, 1)
output <- c(output, rnorm(n, means[[i]]))
}
str(output)
#> num [1:138] 0.912 0.205 2.584 -0.789 0.588 ...
But this is not very efficient because in each iteration, R has to copy all the data from the previous iterations. In technical terms you get “quadratic” (O(n2)
) behaviour which means that a loop with three times as many elements would take nine (32) times as long to run.
There is actually an alternative that is still slower than a full-length zero-vector but much faster than a null-vector: vector with length 1 which will grow over the loop.
x <- NULL
for(i in seq_along(1:10000)){
x <- c(x, i)
}
y <- c(NA)
for(i in seq_along(1:10000)){
y[[i]] <- i
}
z <- rep(NA, times = 10000)
for(i in seq_along(1:10000)){
z[[i]] <- i
}
identical(x, y)
#> [1] TRUE
identical(x, z)
#> [1] TRUE
rbenchmark::benchmark(
nul_vec = {
x <- NULL
for(i in seq_along(1:10000)){
x <- c(x, i)
}
},
uni_vec = {
y <- c(NA)
for(i in seq_along(1:10000)){
y[[i]] <- i
}
},
ful_vec = {
z <- rep(NA, times = 10000)
for(i in seq_along(1:10000)){
z[[i]] <- i
}
}
)
#> test replications elapsed relative user.self sys.self user.child sys.child
#> 3 ful_vec 100 0.42 1.000 0.42 0.00 NA NA
#> 1 nul_vec 100 13.22 31.476 12.48 0.03 NA NA
#> 2 uni_vec 100 0.68 1.619 0.67 0.00 NA NA