Name injection/embrace operator puzzle for a recipe

pathos · November 26, 2021, 8:37am

I looked at these two links:

I was able to get one thing down when creating the function, but different combinations of name injection weren't so clear. I put them in square brackets [ ] where I'm struggling to find the solution.

original_data = data.frame(yy = runif(50, 1, 100),
                           xx = runif(50, 1, 100),
                           zz = runif(50, 1, 100))

y_var = 'yy'
variable_of_interest = 'xx'

print(paste0('Variable of interest is', variable_of_interest ))

select_variable_of_interest = function(ddff, aa, bb) {
  ddff |>
    select( {{ aa }} , {{ bb }} )
}

df = select_variable_of_interest(original_data, {{y_var}}, {{variable_of_interest}})

df_split = initial_split(df)
df_train = training(df_split)

recc = recipe( [y_var] ~ [variable_of_interest], data = df_train) |>
  step_normalize( [variable_of_interest] )

The main problem seems to be with the recipe steps looking at this link: r - wrap tidymodels recipe into function - Stack Overflow

Is there a list of update_role I can use? I would like to also use step_tokenize for NLP, for example. Attempting to follow the example above:

f_recipe = function(dataa, yy, xx) {
  recipe(dataa) |>
    update_role({{yy}}, new_role = 'outcome')
    update_role({{xx}}, new_role = 'predictor')
}

recc = f_recipe(df_train, [y_var], [variable_of_interest])

gives Error: $ operator is invalid for atomic vectors error.

nirgrahamuk · November 26, 2021, 9:44am

A couple of questions for you
where do you set data_of_interest that appears only at

df = select_variable_of_interest({{data_of_interest}},

had you intended it to be set to original_data ?

later at the recipe stage what does 'text' refer to ?

recc = recipe( [variable_of_interest] ~ text, data = df_train)

finally where you step_normalise you reference x, but I dont see that x has been mentioned any other place in your shared code, what did you intend to do here ?

pathos · November 26, 2021, 10:06am

Ah sorry about that. The three of them should be fixed in the code.

nirgrahamuk · November 26, 2021, 11:07am

This is the first approach that comes to my mind.

recc = recipe( as.formula(paste0(y_var, " ~ ",variable_of_interest)), data = df_train) %>%
  step_normalize( {{variable_of_interest }})

system · December 3, 2021, 11:08am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.