# How do I find the best r-squared based on some data transformation parameters?

Hello. I have a use case in R and the solution is eluding me.

The code below produces 2 vectors and a tibble -

``````library(tidyverse)

lamda_values <- c(0.2, 0.3, 0.4)
alpha_values <- c(0.3, 0.5, 0.9)

set.seed(123)
data <- tibble(
sales = runif(5, min = 1000, max = 1500),
var_1 = runif(5, min = 12, max = 25),
var_2 = runif(5, min = 75, max = 90),
)

``````

I want to

1. Apply a transformation to the data using the following formula as an example; `(var * lambda_value) ^ alpha_value` for both `var_1` and ```var_2````.

2. Run a regression a linear regression on the transformed data using `sales` as the target.

3. Return a tibble with the lambda and alpha values (for each var) that produced the highest r-squared (similar to the output for `show_best()` in the `tune` package. Basically I need to run a separate regression for on transformed data for all possible combinations of lambda and alpha values for each var.

This should get you started:

``````best_var1_transform <- expand_grid(l1=lambda_values, a1=lambda_values) %>%
rowwise() %>%
mutate(m = list(lm(sales ~ I((var_1 * l1) ^ a1), data)),
rsq = summary(m)\$r.squared) %>%
{ .[ which.max(.\$rsq), ] }
``````

I think you should use `optim()` to pick the transformation instead.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.