Okay, because I am a bit depraved I couldn't really let this drop. Here is the best I could do for element-wise matrix multiplication.
Doing the matrix multiplication in a nested OMP for loop in FORTRAN I was able to realize a 6x improvement in speed. This is still destructive to the first matrix passed in, but hey, you wanted speed, and this is the fastest I could do.
It's possible you might be able to get faster results doing the same thing in C /C++, but R and FORTRAN matrices are column-major and one-indexed while C/C++ are row-major and zero indexed, and I didn't want to deal with that nonsense. I put it together as a tiny package if anyone wants to have a go at improving it.
If you're on Windows you'll need to have Rtools installed. Linux should have everything you need out of the box. On OSX (assuming you're on R 4.0.0 or later) you'll need Apple Xcode 10.1 and GNU Fortran 8.2, if you're on an earlier version of R you're on your own.
devtools::install_github("elmstedt/elmult", force = TRUE)
library(microbenchmark)
library(elmult)
n <- 3.5e5
p <- 120
set.seed(123)
.a <- matrix(runif(n * p), nrow = n)
b <- matrix(runif(n * p), nrow = n)
microbenchmark(
R = a * b,
FORTRAN = em(a, b),
setup = a <- .a * 1,
check = "identical"
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> R 102.9640 104.64525 129.69537 106.65625 140.4574 207.5471 100
#> FORTRAN 20.0684 20.26365 20.48035 20.36785 20.6380 21.7892 100
Your results will vary though as my machine has an Intel 7800X cpu with 6 hyper-threaded cores for 12 threads total. For the OMP, I set it to use 11 threads. Some other number of threads may be more optimal (for reference 2 threads was getting me about 30ms, 4 threads was about 20 with very little improvement beyond that). The way the multi-threading is setup is to iterate through columns first, this is beneficial since the data is stored column-major, so we are accessing memory sequentially rather than quasi-randomly.
Created on 2020-09-13 by the reprex package (v0.3.0)
EDIT: Also, you should remove the {dplyr} tag as nothing in your question or in any of the answers is related to {dplyr}.