Hi,
I wonder about the time distribution obtained with microbenchmark and rcpp packages.
The R code is here:
R script
library(Rcpp)
library(microbenchmark)
library(ggplot2)
sourceCpp('test.cpp')
nrows <- 200
ncols <- 200
l <- nrows*ncols
#data <- runif(l)
r <- matrix(runif(l), nrow = nrows, ncol = ncols) # random numeric matrix
a <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
b <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
c <- matrix(as.numeric(1:l), nrow = nrows, ncol = ncols) # incremented numerix matrix (col major)
myrfunc <- function(a_in) # pas arg by value (sigh)
{
ncols=ncol(a)
nrows=nrow(a)
for (j in 1:ncols)
{
for (i in 1:nrows)
{
a_in[i,j] <- a_in[i,j]+i+100.0*j
}
}
return(a_in)
}
allm <- microbenchmark("r function" = {a<-myrfunc(a)},
"C++ matrix_update " = {matrix_update(b)},
"C++ matrix_update_bis " = {matrix_update_bis(b)})
autoplot(allm)
ggsave("test_perf.png")
test.cpp
#include <Rcpp.h>
using namespace Rcpp;
inline size_t getindex(size_t i,size_t j,size_t nr,size_t nc) {
return i+nr*j;
}
// [[Rcpp::export]]
void matrix_update(NumericMatrix a) {
//std::cout<< "nrows="<<a.nrow()<<std::endl;
//std::cout<< "ncols="<<a.ncol()<<std::endl;
const size_t nr=a.nrow();
const size_t nc=a.ncol();
const size_t l=nr*nc;
for (size_t j=0 ; j<nc ; j++){
for (size_t i=0 ; i<nr ; i++){
a[getindex(i,j,nr,nc)]+=double(i+1)+100.0*double(j+1);
}
}
}
// [[Rcpp::export]]
void matrix_update_bis(NumericMatrix a) {
//std::cout<< "nrows="<<a.nrow()<<std::endl;
//std::cout<< "ncols="<<a.ncol()<<std::endl;
const size_t nr=a.nrow();
const size_t nc=a.ncol();
for (size_t j=0 ; j<nc ; j++){
for (size_t i=0 ; i<nr ; i++){
a(i,j)+=double(i+1)+100.0*double(j+1);
}
}
}
which gives me the following output:
The minimal and mean C++ times are OK but the max value is very large. I am a complete R beginner and I do not understand what is happening.
Thank you for your help.
Laurent
Thanks @eddelbuettel for the explanation: large max time for a Rcpp call (microbenchmark) · Issue #1157 · RcppCore/Rcpp · GitHub
That's standard R behaviour of, every now and then, requiring a call to garbage collection ( i.e. function gc()
from R). It would be the same if you coded the same test function 'by hand' in C or C++ and interfaced it by hand---there is nothing nefarious here that Rcpp does and that we could simply remove. Most easy fixes have, in fact, been applied by now to a project that is well over ten years old.
filter_gc
If `TRUE` remove iterations that contained at least
one garbage collection before summarizing. If `TRUE`
but an expression had a garbage collection in every
iteration, filtering is disabled, with a warning.
When you benchmark these, do you get a warning that explains that gc filtering is disabled ?
nirgrahamuk:
filter_gc
Thank you for the tip !
It took me a while figuring out that you were refering to another benchmark R package (bench
):
new R script
library(Rcpp)
library(ggplot2)
library(bench)
library(beeswarm)
sourceCpp('test.cpp')
nrows <- 200
ncols <- 200
l <- nrows*ncols
#data <- runif(l)
r <- matrix(runif(l), nrow = nrows, ncol = ncols) # random numeric matrix
a <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
b <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
c <- matrix(as.numeric(1:l), nrow = nrows, ncol = ncols) # incremented numerix matrix (col major)
myrfunc <- function(a_in) # pas arg by value (sigh)
{
ncols=ncol(a)
nrows=nrow(a)
for (j in 1:ncols)
{
for (i in 1:nrows)
{
#cat("i=",i," j=",j, "a[",i,",",j,"]=",a_in[i,j],"\n")
a_in[i,j] <- a_in[i,j]+i+100.0*j
#cat("i=",i," j=",j, "a[",i,",",j,"]=",a_in[i,j],"\n")
}
}
return(a_in)
}
mu=bench::mark(matrix_update(b),matrix_update_bis(b),myrfunc(a),filter_gc = TRUE, check = FALSE)
autoplot(mu)
ggsave("bench_perf.png")
It indeed allows for filtering out the gc overhead:
Thank you again !
Ah, sorry for making a riddle of it!
I got confused because I saw another post earlier in the day where bench was used and confused that for your post. Anyway, glad if it helped
1 Like
system
Closed
April 26, 2021, 7:32am
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.