I have a tibble with m
rows and n
columns (all containing numeric values) and a m
-length numeric vector. I want to divide i
th row of the tibble with the i
th element of the vector (i
ranges from 1 to m
). What would be the most efficient way to do this?
#iris has dimensions 150 rows and 5 columns
#lets make a vector with 150 entries to divide each petal length by.
#in this case I'll make it be 2*petal length,
so we can see that the ultimate calculation happens
dim(iris)
head(iris)
(myvec <- 2* iris$Petal.Length)
(iris$new_calc <- iris$Petal.Length / myvec)
Perhaps I should have been more specific. What I want is to divide all columns in the i
th row with the i
th element of the vector, not just one column. Below is my example code to do this.
my_data = tibble(a=1:10, b=1:10*2, c=1:10*3)
my_vector = 1:10
my_data %<>% mutate_all(~./my_vector)
This code performs exactly what I want, but I guess it's not the most efficient way to do this when my_data
is very large because the calculation is repeated along the columns. Is there any alternative way?
I dont understand this comment. either you want to perform a calculation on each cell of your original table, or you dont ?
I do. What I meant was this: If I do this,
my_data %<>% mutate_all(~./my_vector)
I think it's same as doing this.
my_data %<>% mutate(a=a/my_vector, b=b/my_vector, c=c/my_vector, ...)
Thus if my_data
has 10 thousand columns, the vectorized division calculation should also be performed 10 thousand times, which made me guess this is not the most efficient way to do this particular job.
sorry, I think we are at an empass.
if you want 10,000 columns divided, I dont see another option than writing code that divides them all.
Unless you are hinting that you expect certain columns to be duplicates, in which case you might optimise to skip these, or borrow the values from similar columns, but in the general case. No, theres no way to do all the calculations you said you wish to do without .. doing them...
However it is the case the tidyverse/dplyr trades away performance for its friendly syntax.
So you could use a faster way to do all the divisions by using data.table package. This has its own syntax that you can learn. also, lately there have been improvements to the dtplyr package (extra t between the d and the p) this lets you make your instructions in dplyr syntax but then uses data.table to do the calculations.
I think you would see a speed up with the data.table back end. However, it would still be performing numerous divisions (as it seems thats what you require)
Thank you for your quick and informative answers!
you're very welcome. and if you find a particulary solution you favour, you can come back and share the knowledge
This is a lot more straightforward:
my_data = matrix(c(1:10, 1:10*2, 1:10*3), nrow = 10, ncol = 3)
my_vector = 1:10
my_data <- my_data/my_vector
of course, the matrix data structure would be much the faster. doh !
Thank you. Perhaps the best solution seems to be first to convert my_data
into matrix and then perform the calculation.
Something like this @junghoonshin?
m = 10
n = 6
X = matrix(data = round(rnorm(m*n), 1), nrow = m, ncol = n)
v = round(rnorm(m), 1)
Y = t(sapply(seq(1, m), function(i){
return(X[i,] / v[i])
}))
Hope it helps
X/v
is all that's required in this example. No loops, no apply functions.
Right you are - Nice!
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.