How do I find the best r-squared based on some data transformation parameters?

Hello. I have a use case in R and the solution is eluding me.

The code below produces 2 vectors and a tibble -

library(tidyverse)

lamda_values <- c(0.2, 0.3, 0.4)
alpha_values <- c(0.3, 0.5, 0.9)

set.seed(123)
data <- tibble(
    sales = runif(5, min = 1000, max = 1500),
    var_1 = runif(5, min = 12, max = 25),
    var_2 = runif(5, min = 75, max = 90),
)


I want to

  1. Apply a transformation to the data using the following formula as an example; (var * lambda_value) ^ alpha_value for both var_1 and ```var_2````.

  2. Run a regression a linear regression on the transformed data using sales as the target.

  3. Return a tibble with the lambda and alpha values (for each var) that produced the highest r-squared (similar to the output for show_best() in the tune package. Basically I need to run a separate regression for on transformed data for all possible combinations of lambda and alpha values for each var.

This should get you started:

best_var1_transform <- expand_grid(l1=lambda_values, a1=lambda_values) %>%
 rowwise() %>%
 mutate(m = list(lm(sales ~ I((var_1 * l1) ^ a1), data)),
       rsq = summary(m)$r.squared) %>%
 { .[ which.max(.$rsq), ] }

I think you should use optim() to pick the transformation instead.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.