Why is sapply or vapply sometimes so slow ?

Hi everyone,

Could you explain why vapply or sapply are so slow in my R code example just below ?
I tried to simplify my initial code to underline the main problem.

sequence_of_days <- seq(Sys.Date() - 100000, Sys.Date(), by = "days")
string <- paste(rep("A", 100), collapse = "")

vec_strings_1 <- paste0(string, sequence_of_days)

vec_strings_2 <- vapply(sequence_of_days, \(x) paste0(string, x), character(1))

vec_strings_3 <- sapply(sequence_of_days, \(x) paste0(string, x))

With 10 loops the time for one loop is on average roughly :

0.12s to calculate vec_strings_1
3.23s to calculate vec_strings_2
3.48s to calculate vec_strings_3

Thanks

Alain

What you see here is exactly why it is important to embrace functional programming, when using R. The built in paste0()-function solves exactly what you are trying to achieve. Therefore, when programming in R knowing and understanding how to apply (pun intended) built in functions are key to speed.

In your case, there is overhead in calling one function multiple times, rather than exploiting vectorisation and calling the function one time with appropriate vectors.

In fact R despite the reputation, is not that slow - Using R wrongly or for the wrong task is slow :+1:

1 Like

Thanks for answer.

I suppose that it depends greatly of how the language is dealing data structures under the hood.
Vectors (and vectorization), and lists seem to be the corner stone data structures for R.
So, idealy we have to learn how these structures are basically transformed by the core functions to get an optimal code.

Is it right ?

Alain

I think you are caught out in particular by mixing types, and relying on implicit type conversion for your results.
When using non-vectorised solutions you are paying this cost repeatedly. Also the way sapply and vapply may try to add names to your results, is a difference.

sequence_of_days <- seq(Sys.Date() - 100000, Sys.Date(), by = "days")
string <- paste(rep("A", 100), collapse = "")

sequence_of_days_chr <- as.character(sequence_of_days) 


library(bench)
bench::mark(
  v1  =  paste0(string, sequence_of_days),
  v2  = vapply(sequence_of_days, \(x) paste0(string, x), character(1)),
  v3  =  sapply(sequence_of_days, \(x) paste0(string, x)),
  v1x  =  paste0(string, sequence_of_days_chr),
  v2x = vapply(sequence_of_days_chr, \(x) paste0(string, x), character(1),USE.NAMES = FALSE)   ,
  v3x = sapply(sequence_of_days_chr, \(x) paste0(string, x),USE.NAMES = FALSE)  
)
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory    
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>    
1 v1         388.75ms 399.91ms     2.50    32.81MB     1.25     2     1   799.83ms <chr>  <Rprofmem>
2 v2            3.66s    3.66s     0.273    1.53MB     9.57     1    35      3.66s <chr>  <Rprofmem>
3 v3            3.95s    3.95s     0.253    4.05MB    10.4      1    41      3.95s <chr>  <Rprofmem>
4 v1x         35.84ms  41.05ms    24.7     781.3KB     0       13     0   526.84ms <chr>  <Rprofmem>
5 v2x        352.83ms 354.71ms     2.82    781.3KB     9.87     2     7   709.42ms <chr>  <Rprofmem>
6 v3x         380.7ms    385ms     2.60     3.29MB    10.4      2     8      770ms <chr>  <Rprofmem>

if you look at thev1 and v1x version, converting the dates to character up front is a costly operation in itself.

bench::mark(
  dates_to_chars = as.character(sequence_of_days) 
)
2 Likes

Great analysis nirgrahamuk !

Thank you.

Alain

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.