Hi,
I have a MARS
model where i am trying to run permutation tests based on auc
for a binary classification problem.
I am using the iml package to try and achieve this. The iml package supports a range of loss functions and also allows you to define your own. AUC is not in the list of loss functions it supports but the package does allow you to create your own loss function based on the Metrics package (pg 5) which does have a function for auc. Quoting from the iml documentation on page 9
The loss function can be either specified via a string, or by handing a function to
FeatureImp(). If you want to use your own loss function it should have this signature: function(actual, predicted). Using the string is a shortcut to using loss functions from the Metrics package. Only use functions that return a single performance value, not a vector. Allowed losses are: "ce", "f1", "logLoss", "mae", "mse", "rmse", "mape", "mdae", "msle", "percent_bias", "rae", "rmse", "rmsle","rse", "rrse", "smape" See library(help = "Metrics") to get a list of functions.
I'm at a loss on how to implement it as there isn't an example in the documentation from what i can see. Can anyone help? Below is a minimal example based on the iris dataset
library(iml)
#> Warning: package 'iml' was built under R version 3.5.2
library(Metrics)
#> Warning: package 'Metrics' was built under R version 3.5.2
library(earth)
#> Loading required package: plotmo
#> Loading required package: plotrix
#> Loading required package: TeachingDemos
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Create a dataframe with a binary flag
mydf <- iris %>%
mutate(dep = ifelse(Species == 'setosa', 1, 0)) %>%
select(-Species)
# Run a Mars Model to try and predict the binary flag outcome
mod <- earth(form = dep~., data = mydf, glm=list(family=binomial, maxit = 100))
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Setup for the iml package
X <- mydf %>% select(-dep)
y <- mydf %>% select(dep)
mymodel = Predictor$new(mod, data = X, y = y)
# This works
FeatureImp$new(mymodel,loss = 'mae', n.repetitions = 10, run = TRUE)
#> Interpretation method: FeatureImp
#> error function: mae
#>
#> Analysed predictor:
#> Prediction task: unknown
#>
#>
#> Analysed data:
#> Sampling from data.frame with 150 rows and 4 columns.
#>
#> Head of results:
#> feature importance.05 importance importance.95 permutation.error
#> 1 Sepal.Length 1.0284097 1.0408661 1.1081485 36.52965
#> 2 Sepal.Width 0.9880988 0.9958539 1.0069320 34.94992
#> 3 Petal.Length 0.4852722 0.6511579 0.7196340 22.85267
#> 4 Petal.Width 0.4103780 0.6475188 0.7025534 22.72495
# I would like to get this to work
FeatureImp$new(mymodel,loss = 'auc', n.repetitions = 10, run = TRUE)
#> Error in withCallingHandlers({: Assertion on 'loss' failed: Must be element of set {'ce','f1','logLoss','mae','mse','rmse','mape','mdae','msle','percent_bias','rae','rmsle','rse','rrse','smape'}, but is 'auc'.