Is it possible to embed mutate WITHIN ifelse (not the reverse)?

ofatunde · July 27, 2020, 2:02am

Hello all,

I know that it's possible to embed an ifelse statement within a dplyr mutate call, but in this case I am running a customized regression on customized datasets that are generated based on a combination of parameters specified on different rows of a dataframe. I would like to add to each customized dataset only the variables that are going to be used in that customized regression. In other words, I want to start with a baseline dataset, and then selectively mutate to add new columns only if parameters from the previous column hold certain values.

Say I have a parameter column called "fixed.effects" which can take on values of 1,2, or 3. I want to use mutate to add dummy variables from blocks A, B, or C, corresponding to the value of fixed.effects. In order to add the dummy variable, I use an ifelse statement inside mutate to check the value of other variables within the customized (nested) dataframe. But how can I get my function to check the value of the fixed.effects column, which is outside the nested dataframe (it's one of the variables that was used for group_by), and only perform the mutate that corresponds to a given value of fixed.effects?

in terms of how I plan to execute this, I have a function that has the mutate function inside of it, and I plan to use the purrr:map series to apply this function in a rowwise manner to each of the rows of parameter combinations (including values of fixed.effects). I'm hoping that the value of fixed.effects can serve as an input into this function so that it knows which columns to add.

Thanks in advance, and sorry if this is a silly question...

ofatunde · July 27, 2020, 2:32am

Update: I should mention that I have tried this:

add.dummies=function (fixed.effects){

final.data.frame.for.regression=ifelse(fixed.effects==3,base data %>% mutate(add variables for specification 3),
ifelse(fixed.effects==2,base data %>%mutate(add variables for specification 2),
base data %>%mutate(add variables for specification 1)))
}

In addition to making the code clumsy and very long, this results in the following error message:

"Error: x and y must share the same src, set copy = TRUE (may be slow)."

I have also tried simply adding all variables to the datasets for all rows, even though the regressions for some rows will not use all of them. This works, but results in the inclusion of many unnecessary columns in data frames that regenerated multiple times during bootstrapping and are nested into a larger data frame. This creates a very heavy object which often gives me memory issues and causes R to crash.

So I'm wondering if anyone would suggest an alternative. Thanks!

nirgrahamuk · July 27, 2020, 9:33am

I found it a challenge to identify exactly what you are trying to accomplish/how.
I've created a reprex that I think might address the sort of thing you might be wanting to do.
Anyhow its a basis for discussion, and perhaps you can create your own reprex based on my demonstration and we can continue to discuss your needs/approach.



# lets analyse iris, lets make different version of iris with 
# both different rows based on filtering criteria, and different 
# dummy vars based on parameterised cutoffs.
# step 1 make a  data.frame containing the params that 
# differentiate how we will analyse the same data base data in different ways....

myparams <- tibble(
  Species_to_include = list(
    "setosa",
    c( "virginica","versicolor")
  ),
  Petal.width_dummy_cut = c(.1, 2),
  formula_to_use = list(
    Petal.Length ~  petal_width_dummy,
    Petal.Length ~ Species + petal_width_dummy
  )
)

prepped_dfs <- map(
  1:nrow(myparams),
  ~ (filter(iris, Species %in% 
             unlist(myparams[.x, "Species_to_include"])) %>%
    mutate(petal_width_dummy = Petal.Width> unlist(
      myparams[.x, "Petal.width_dummy_cut"])
    ))
)

regress_results <- map2(.x = myparams$formula_to_use,
                        .y = prepped_dfs,
                       ~ list("formula_used" = .x,
                             "lm"= lm(data=.y,
                            formula = .x)))

system · August 17, 2020, 9:33am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.