The detect() function seems to be unduly slow. I wonder if I am doing something wrong, or if this slowness is simply part of the function.
In my intended application I am trying to find the first nonzero element in each (row, column) cross section of a (600, 600, 6) array. This computation will be repeated at least 1000 times in a simulation, and more likely 10,000 times. The detecting the first non-zero element is in the innermost loop, so the timing really matters.
In the reprex below I repeat just 10 times, using the detect() function two different ways, using the first() function, and using a handwritten function to do the same thing. The timing difference is dramatic. (3 minutes vs. 7 seconds) Unless I learn to more effectively use detect(), I will use the fourth method.
A couple notes:
- The real application does a couple things with the output matrix, but those are pretty fast, and irrelevant here.
- The data structure of a 3-dimensional array is natural in the intended application.
- I am fully prepared to be told that I am doing something stupid. My prior, based on over 60 years of programming experience, is that stupidity is highly likely.
- The computations were performed in R version 4.2.1, and used tidyverse 1.3.2 and purrr version 0.3.4.
# File R code timing minimal
# Bill Anderson -- October 2022
require(tidyverse)
#> Loading required package: tidyverse
########################### Functions to return the first non-zero entry in a numeric vector
# use the detect() function with an an anonymous function
FirstinCrossSection_alta <- function(vecotr){
vecotr %>% detect(~.x != 0, .default = 0)
}
# use the detect() function with a named function
nonzero <- function(x){
x != 0
}
FirstinCrossSection_altb <- function(vecotr){
vecotr %>% detect(nonzero, .default = 0)
}
# use the first() function
FirstinCrossSection_altc <- function(vecotr){
first(vecotr[vecotr != 0], default = 0)
}
# use a hand-written loop
FirstinCrossSection_altd <- function(vecotr){
for (i in 1:length(vecotr)){
if (vecotr[i] != 0) return(vecotr[i])
}
return(0)
}
############################ Generate the data
# In a real application these numbers would come from trial data
set.seed(10000)
nsubjects <- 600 # might be up to a few thousand in a real application
ncomponents <- 6 # about as large as this ever would be
nreps <- 10 # make this large enough to get randomness out of the timing
outcomevector <- sample(c(-1, 0, 1), ncomponents*nsubjects^2, prob = c(1, 1, 1), replace = TRUE)
winarray <- array(outcomevector, dim <- c(nsubjects, nsubjects, ncomponents))
########################## Evaluate the various algorithms
# use the detect() function with an an anonymous function
starttime <- Sys.time()
for (i in 1:nreps){
winmatrixa <- apply(winarray, c(1, 2), FirstinCrossSection_alta)
}
(timea <- Sys.time() - starttime)
#> Time difference of 3.405451 mins
# use the detect() function with a named function
starttime <- Sys.time()
for (i in 1:nreps){
winmatrixb <- apply(winarray, c(1, 2), FirstinCrossSection_altb)
}
(timeb <- Sys.time() - starttime)
#> Time difference of 2.566472 mins
# use the first() function
starttime <- Sys.time()
for (i in 1:nreps){
winmatrixc <- apply(winarray, c(1, 2), FirstinCrossSection_altc)
}
(timec <- Sys.time() - starttime)
#> Time difference of 12.53203 secs
# use a hand-written loop
starttime <- Sys.time()
for (i in 1:nreps){
winmatrixd <- apply(winarray, c(1, 2), FirstinCrossSection_altd)
}
(timed <- Sys.time() - starttime)
#> Time difference of 7.226184 secs
Created on 2022-10-11 with reprex v2.0.2