Problem with variable importance in caret

jsalinas · November 22, 2023, 4:56pm

I'm using caret for classification binary model.

ctrl <- trainControl(method = "cv", number = 10)
set.seed(2023)
modelo_cart <- train(  data.train[, predictores],   data.train[, target],   method = "rpart",
  trControl = ctrl,   metric = "Accuracy",   tuneGrid = expand.grid(cp = seq(0, 0.05, 0.001))
)

modelo_cart
modelo_cart$bestTune
modelo_cart$finalModel

varImp(modelo_cart)

However, in the variable importance some variables like 17_falta apetito appears twice, one with importance 70.87 y after with 17_falta_apetito with importance 0.00

                                                         Overall
18_dificil_decidir                      100.00
9_ansiedad                                    96.76
2_fatiga_croni ca                         80.19
**17_falta_apetito                           70.87**
14_conflictivo                                68.27
22_actividades_evaluativas    35.10
7_intranquilidad                          34.15
24_no_entiende_clase             17.04
10_falta_concentracion             7.80
26_poco_tiempo_actvividad     6.67
**`17_falta_apetito`                        0.00**
`6_somnolencia`                          0.00
`9_ansiedad`                                 0.00
`1_trastorno_sueno`                  0.00
genero_X4                                        0.00
`13_carga_economica`            0.00
`23_tipos_actividades`            0.00
`12_falta_motivacion`              0.00
`10_falta_concentracion`       0.00
`3_dolor_cabeza`                       0.00

I don´t understand the reason. I use C5 or Random Forest and everything works very good.

I would appreciate any help or suggestions

nirgrahamuk · November 22, 2023, 5:25pm

Thanks for providing code. Could you kindly take further steps to make it easier for other forum users to help you? Share some representative data that will enable your code to run and show the problematic behaviour.

How do I share data for a reprex?

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

Reprex Guide

jsalinas · November 22, 2023, 6:04pm

Thanks

data.train[1,]
1_trastorno_sueno 2_fatiga_cronica 3_dolor_cabeza
1 0.25 0 0.5
5_rascar_morder_una 6_somnolencia 7_intranquilidad 9_ansiedad
1 0.25 0 0 0.25
10_falta_concentracion 11_agresivo_irritable 12_falta_motivacion
1 0.5 0 0
13_carga_economica 14_conflictivo 15_aislamiento 17_falta_apetito
1 0 0 0 0.25
18_dificil_decidir 21_caracter_profesor 22_actividades_evaluativas
1 0.1429 0.25 0.25
23_tipos_actividades 24_no_entiende_clase
1 0.5 0.5
26_poco_tiempo_actvividad depresion genero_X2 genero_X3
1 0.75 Con_depresion 1 0
genero_X4
1 0

The numeric data was standarized in scale [0,1} and the categorical variable was transformed in dummy

nirgrahamuk · November 22, 2023, 8:08pm

I cannot reproduce your issue with what you shared here.
Perhaps try one of the suggestions ?

system · January 3, 2024, 8:08pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.