I am trying to use a purrr approach to make pairwise calculations with specific columns of a dataframe and I am wondering if it is a good idea in terms of speed and memory efficiency. The steps I have followed are:
Create a dataframe using grid.expand() that contains the columns names I want to use in each calculation.
The previews code is just an example to illustrate the idea. What I am wondering is if for each iteration the dataframe is been copied and therefore being an inefficient strategy.
If you can put it into a reprex, it'll help to move things along.
As of now I'm not really certain what exactly you are trying to achieve with map2_dbl. Do you want to take 2 values from each column and combine them in some way? There is base::rowMeans function that seems to do what you want without any mapping.
Hi mishabalyasin, thanks for your answer. The reason I didn't attach a reprex was because my question is not about problems with coding but a theoretical point of view, anyway I will code a toy example and send it.
Let me try to explain better my question. Let’s say I have a dataframe with 10 columns. I want to make a calculation using specific pairs of columns, but not between all them. My approach to achieve it is:
Create a custom_function that receives as arguments the name of two columns and a dataframe, and return the value of the calculation.
Store in two vectors the names of the columns that compose each pair of interest. For example, vectorA[1] and vectorB[1] form the first pair of columns.
Using map_2() pass both vectors as .x and .y , and custom_function as f, with the dataframe as argument.
Although this works, I’m wondering if it is efficient. Is the dataframe being copied in each call to the custom_function? In other words, if there is a dataframe in the current environment and I pass it as argument to a map(), is the dataframe copied in each iteration?