Understanding Conditional Inference Forests Variable Importance

technocrat · March 21, 2020, 8:28pm

Disclosure: I haven't done much with forestry. I am pretty good at reading the function signatures.

Let's start at the top.

To find: Implementation of point two

select all Z
apply some function f(Z) to find cutpoint of tree
bisect the sample space of each cutpoint

Issue: How to select Z?

Hypothesis: varimp(values)\ \&\ conditional = TRUE

To find: values

Suggested approach.

Confirm that party::varimp does this.

From the description

[The party package addresses] a statistical approach [to recursive partitioning] which takes into account the distributional properties of the measures.

From the vignette

We need to decide whether there is any information about the response variable covered by any of the m covariates.

When we are not able to reject H_0 at a pre-specified level \alpha, we stop the recursion [in ctree]

This leads to ctree as a starting point for selecting Z, and ctree is called from cforest, which produces the BinaryTree Class object used by varimp,

If conditional = TRUE, the importance of each variable is computed by permuting within a grid defined by the covariates that are associated (with 1 - p-value greater than threshold) to the variable of interest. The resulting variable importance score is conditional in the sense of beta coefficients in regression models, but represents the effect of a variable in both main effects and interactions.

Winding back, each Z is a covariate to some response variable, Y in the original formula argument, which may produce a correlation, but it is not the correlation that is manipulated from that point, but derivative statistics.

What I've tried to do is to illustrate how I try to unwind questions like this. Hope it's helpful