rray vs matrix - performance

josiah · January 4, 2020, 9:51am

I am currently exploring the use of rray as I find myself having to work with a matrix. I'm trying to better understand the vctrs paradigm with rray.

I've noticed some rather significant performance difference between a base matrix and an rray. Is there principle behavior of an rray that leads to this? Or is this a consequence of the type consistency?

# make word matrix 
lyric <- matrix(rep(c("around", "the", "world"), 10), ncol = 10, nrow = 10)

# create matrix and rray to fill for self-sim
base_mat <- matrix(nrow = 10, ncol = 10)
rry <- rray::rray(NA, dim = c(10, 10))

# function for self_similarity matrix 
loop <- function(mat) {
  mat_size <- nrow(mat)
  for (col in 1:mat_size) {
    for(row in 1:mat_size) {
      mat[row, col] <- (mat[row, col] <- lyric[row, col] == lyric[col,col])
    }
  }
  mat
}

# time
tictoc::tic()
b_self_sim <- loop(base_mat)
tictoc::toc()
#> 0.027 sec elapsed

tictoc::tic()
rr_self_sim <- loop(rry)
tictoc::toc()
#> 0.063 sec elapsed

^{Created on 2020-01-04 by the reprex package (v0.3.0)}

davis · January 4, 2020, 9:41pm

Whew, okay, there are a few things at work here. Unfortunately both are inherent R limitations, at least until 4.0.0 is released with reference counting.

I'm fairly certain the speed difference is not really due to the implementation of rray's sub-assignment function, and has more to do with the fact that base R can take advantage of a trick that allows it to not have to copy mat every time an assignment is done with <-. Let's look at a simpler example:

library(rray)
library(profmem)

base_mat <- matrix(data = NA_real_, nrow = 10, ncol = 10)
rry <- rray(NA_real_, dim = c(10, 10))

fn <- function(x) {
  for (i in 1:100) {
    x[1, 1] <- 1
  }
}

profmem_to_tbl <- function(x) {
  out <- tibble::as_tibble(as.data.frame(x))
  # remove new page allocs
  out[!is.na(out$bytes),]
}

profmem_to_tbl(profmem(fn(base_mat)))
#> # A tibble: 1 x 3
#>   what  bytes calls
#>   <chr> <dbl> <chr>
#> 1 alloc   848 fn()

profmem_to_tbl(profmem::profmem(fn(rry)))
#> # A tibble: 200 x 3
#>    what  bytes calls                                                            
#>    <chr> <dbl> <chr>                                                            
#>  1 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  2 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  3 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  4 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  5 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  6 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  7 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  8 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  9 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#> 10 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#> # … with 190 more rows

^{Created on 2020-01-04 by the reprex package (v0.3.0.9000)}

Here R's <- only makes 1 copy of x over the loop of 100 assignments. rray on the other hand has to make 2 copies per loop, so 200 total. Those copies are what kill you. There are two reasons for this.

The first is that R's base matrices can use a trick. The first time that x has a 1 assigned to it, a copy is made. The next time it happens R recognizes that the fresh copy of x is not used anywhere else, so it does not make a copy and just reuses that memory. Unfortunately that feature is not available to package developers so rray has to make at least 1 copy on every iteration.

Side note: in R 4.0.0 there will be a "reference counting" feature that will allow package developers to keep track of the fact that x has not been "referenced" anywhere, and that it can be reused.

The other copy per iteration comes from the fact that, for whatever reason, <- forces a copy on you if you have an S3 method for it. For matrices it drops straight into a C implementation, but for rray objects it has to go through [<-.vctrs_rray, forcing the second copy. There isn't anything I can do about that either.

Hopefully that gets better in R 4.0.0 too, but I'm not sure.

josiah · January 4, 2020, 11:31pm

Thank you! This is extremely helpful.

josiah · January 11, 2020, 11:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.