Understanding Conditional Inference Forests Variable Importance

john.smith · March 21, 2020, 4:21pm

Hi,

I'm currently reading a paper called Conditional variable importance for random forests and am trying to get a handle on how the variable importance works. I am finding the four steps outlined below very dense to go through and I was wondering if someone could clarify so I can get a conceptual understanding as to why the variable importance in Conditional Inference Forests differs from Random Forest. I have reproduced the steps below with out the maths formulae.

In each tree compute the oob-prediction accuracy before the permutation
For all variables Z to be conditioned on: Extract the cutpoints that split this variable in the current tree and create a grid by means of bisecting the sample space in each cutpoint.
Within this grid permute the values of X j and compute the oob-prediction accuracy after permutation
The difference between the prediction accuracy before and after the permutation accuracy again gives the importance of X j for one tree

I am mostly getting bogged down on step two. I think Z is basically if correlation is above X amount then permutate the attributes together. THis is based on the values fed into the function varimp function in the party package and setting conditional to TRUE.

Have i got the wrong end of the stick completely?

Thanks

technocrat · March 21, 2020, 8:28pm

Disclosure: I haven't done much with forestry. I am pretty good at reading the function signatures.

Let's start at the top.

To find: Implementation of point two

select all Z
apply some function f(Z) to find cutpoint of tree
bisect the sample space of each cutpoint

Issue: How to select Z?

Hypothesis: varimp(values)\ \&\ conditional = TRUE

To find: values

Suggested approach.

Confirm that party::varimp does this.

From the description

[The party package addresses] a statistical approach [to recursive partitioning] which takes into account the distributional properties of the measures.

From the vignette

We need to decide whether there is any information about the response variable covered by any of the m covariates.

When we are not able to reject H_0 at a pre-specified level \alpha, we stop the recursion [in ctree]

This leads to ctree as a starting point for selecting Z, and ctree is called from cforest, which produces the BinaryTree Class object used by varimp,

If conditional = TRUE, the importance of each variable is computed by permuting within a grid defined by the covariates that are associated (with 1 - p-value greater than threshold) to the variable of interest. The resulting variable importance score is conditional in the sense of beta coefficients in regression models, but represents the effect of a variable in both main effects and interactions.

Winding back, each Z is a covariate to some response variable, Y in the original formula argument, which may produce a correlation, but it is not the correlation that is manipulated from that point, but derivative statistics.

What I've tried to do is to illustrate how I try to unwind questions like this. Hope it's helpful

john.smith · March 22, 2020, 7:27am

Thanks @technocrat for your reply.
If i understand your logic correctly you are suggesting that Z is basically the statistical test they use to compare predictors? If the test comes back as significant it is included with the groups of attributes that a permutated? The cutoff for the 1-p-value (i think this is power for the test) for significance is set to 0.2 in the threshold attribute of the function?

I could be totally off base but I'm basing this on this answer

technocrat · March 22, 2020, 7:55am

Hi, John,

Don't discount the possibility that I'm more lost than you are. I hope to oscillate between dazzling brilliance and useful idiot, but can't exclude the possibility of missing both.

My take is that Z is a coefficient in a model comparable to X_{i,j} in the link. For Z and a model, the test is on the cutpoints of a tree model using Z and some response variable Y. The test is against the null hypothesis H_0 that the cutpoint of Z tells us nothing more at \alpha "significance" (horrid word). So long as we reject H_0, we continue. When we fail, we stop.

This stuff is hard, and I'm certainly trying to punch above my weight class.

john.smith · March 22, 2020, 11:30am

Here is my last ditch attempt to understand what is happening with a bit of code

I have seen this blog post which describes ctrees in a very understandable manner (for me in any case :). Unfortunately they do not go into the variable importance in great detail

According to the original authors 1, 2, the Z data-frame or conditional grid described in the original paper is a data-frame where the columns are the rules of the individual rules of each decision tree. Here is where i get lost again :). The paper says

The set of variables Z to be conditioned on should contain all variables that are correlated with the current variable of interest X_j. In the varimp function,this is assured by the small default value 0.2 of the threshold argument: By default, all variables whose correlation with X_j meets the condition 1 - (p-value) > 0.2 are used for conditioning. A larger value of threshold would have the effect that only those variables that are strongly correlated with Xj would be used for conditioning, but would also lower the computational burden.

So if I understand this correctly,

You first determine the variables to use, using the threshold for your 1-(p-value)
You remove all variables with a lower correlation than the threshold (this can be correlation for numeric numbers or p-values for measures of association)
Then you generate the trees with the new set of attributes
You create your Z data-frame based on the cut-points (rules)
Within each grid generated from the tree you permutate the variable you are interested in
The value of the non permutated performance vs the performance with the permutated performance gives you the variable importance
This is then averaged over all of the trees to see the total result

I think this would make more sense to me if I tried to see Z looks like
Z is created below based on the rules with a single Tree but where does the variable i want to use go within Z?

Apologies if I appear to be laboring the point

library(party)
library(janitor)
library(tidyverse)

# Create a dataframe where we are trying to predict Setosa
mydf <- iris %>% 
  mutate(set_tgt = factor(ifelse(Species == 'setosa', 'yes', 'no'))) %>% 
  select(-Species)

glimpse(mydf)
#> Observations: 150
#> Variables: 5
#> $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9,...
#> $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1,...
#> $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5,...
#> $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1,...
#> $ set_tgt      <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes,...

# We will try to predict "set_tgt"
cf_mod <- cforest(set_tgt ~ .,
                  data = mydf, 
                  control = cforest_unbiased(mtry = 2, ntree = 3))
  
# If we use conditional set to true it permutates the variables 
# based on the threshold
varimp(cf_mod, conditional = TRUE, threshold = 0.2) %>% 
  enframe() %>% 
  arrange(desc(value))
#> # A tibble: 4 x 2
#>   name         value
#>   <chr>        <dbl>
#> 1 Petal.Length  0.40
#> 2 Sepal.Length  0   
#> 3 Sepal.Width   0   
#> 4 Petal.Width   0

# Finding Z
mod <- ctree(set_tgt ~ .,data = mydf)
plot(mod)


# Row names are the label we are trying to predict
Z <- tibble("Petal.Length <= 1.9" = 50,
            "Petal.Length > 1.9" = 0) %>% 
  bind_rows(tibble("Petal.Length <= 1.9" = 0,
                   "Petal.Length > 1.9" = 100)) %>% 
  data.frame() %>% 
  clean_names()

row.names(Z) <- c("no", "yes")

Z
#>     petal_length_1_9 petal_length_1_9_2
#> no                50                  0
#> yes                0                100

^{Created on 2020-03-22 by the reprex package (v0.2.1)}

technocrat · March 22, 2020, 4:20pm

I’m on a tablet, but will get back to you later. The first thing that pops up is that a conditioning p-value of 0.001 is far lower than the recommended 0.2

system · April 12, 2020, 4:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.