My dataset consists of a numeric variable (called "N4") and several categorical variables that affect the numeric variable. For example there is a categorical variable called "die" that if it equals "alpha" then N4 takes values around 100, if it equals "beta" then N4 takes values around 300.
My goal is to figure out which of the categorical variables most affects my numeric variable.
Can it make sense to turn categorical variables into numerical variables and calculate correlation? Is there any other more effective analysis?
While some models can analyze the best variable even though some variables are factors, it is good practice to convert every categorical variable to dummy columns before training the model.
Dummy columns with 1s and 0s better predict the response variable than categorical columns.
You can use step_dummy() in recipes or it's equivalents in other packages to convert categorical variables into dummy columns.